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Abstract 

We investigate the computation of Csiszar's bounds for the joint source-channel 
coding (JSCC) error exponent, Ej, of a communication system consisting of a dis- 
crete memoryless source and a discrete memoryless channel. We provide equivalent 
expressions for these bounds and derive explicit formulas for the rates where the 
bounds are attained. These equivalent representations can be readily computed for 
arbitrary source-channel pairs via Arimoto's algorithm. When the channel's dis- 
tribution satisfies a symmetry property, the bounds admit closed-form parametric 
expressions. We then use our results to provide a systematic comparison between the 
JSCC error exponent Ej and the tandem coding error exponent Et, which applies 
if the source and channel are separately coded. It is shown that Et < Ej < 2Et- 
We establish conditions for which Ej > Et and for which Ej = 2Et- Numerical 
examples indicate that Ej is close to 2Et for many source-channel pairs. This gain 
translates into a power saving larger than 2 dB for a binary source transmitted over 
additive white Gaussian noise channels and Rayleigh fading channels with finite 
output quantization. Finally, we study the computation of the lossy JSCC error 
exponent under the Hamming distortion measure. 

Index Terms: Joint source-channel coding, tandem source and channel coding, error expo- 
nent, reliability function, Fenchel's Duality, Hamming distortion measure, random-coding 
exponent, sphere-packing exponent, symmetric channels, discrete memoryless sources and 
channels. 
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1 Introduction 



Traditionally, source and channel coding have been treated independently, resulting in 
what we call a tandem (or separate) coding system. This is because Shannon in 1948 
[45] showed that separate source and channel coding incurs no loss of optimality (in 
terms of reliable transmissibility) provided that the coding blocklength goes to infinity. 
In practical implementations, however, there is a price to pay in delay and complexity, for 
extremely long blocklength. To begin, we note that joint source- channel coding (JSCC) 
might be expected to offer improvements for the combination of a source with significant 
redundancy and a channel with significant noise, since, for such a system, tandem coding 
would involve source coding to remove redundancy and then channel coding to insert 
redundancy. It is a natural conjecture that this is not the most efficient approach (even if 
the blocklength is allowed to grow without bound). Indeed, Shannon [45] made this point 
as follows: 

■ ■ ■ However, any redundancy in the source will usually help if it is utilized 
at the receiving point. In particular, if the source already has a certain re- 
dundancy and no attempt is made to eliminate it in matching to the channel, 
this redundancy will help combat noise. For example, in a noiseless telegraph 
channel one could save about 50% in time by proper encoding of the messages. 
This is not done and most of the redundancy of English remains in the channel 
symbols. This has the advantage, however, of allowing considerable noise in 
the channel. A sizable fraction of the letters can be received incorrectly and still 
reconstructed by the context. In fact this is probably not a bad approximation 
to the ideal in many cases ■ ■ ■ 

The study of JSCC dates back to as early as the 1960's. Over the years, many works 
have introduced JSCC techniques and illustrated (analytically or numerically) their bene- 
fits (in terms of both performance improvement and increased robustness to variations in 
channel noise) over tandem coding for given source and channel conditions and fixed com- 
plexity and/or delay constraints. In JSCC systems, the designs of the source and channel 
codes are either well coordinated or combined into a single step. Examples of (both 
constructive and theoretical) previous lossless and lossy JSCC investigations include: 

(a) JSCC theorems and the separation principle [6], [10], [15], [20], [23], [26], [28], [29], 
[32], [51]; 
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(b) source codes that are robust against channel errors such as optimal (or sub-optimal) 
quantizer design for noisy channels [4], [9], [21], [22], [25], [33]-[35], [39], [41], [47], 
[48], [50]; 

(c) channel codes that exploit the source's natural redundancy (if no source coding is 
applied) or its residual redundancy (if source coding is applied) [3], [27], [38], [44], 
[57]; 

(d) zero-redundancy channel codes with optimized codeword assignment for the trans- 
mission of source encoder indices over noisy channels (e.g., [21], [54]); 

(e) unequal error protection source and channel codes where the rates of the source and 
channel codes are adjusted to provide various levels of protection to the source data 
depending on its level of importance and the channel conditions (e.g., [30], [40]); 

(f) uncoded source-channel matching where the source is uncoded, directly matched to 
the channel and optimally decoded (e.g., [2], [24], [46], [53]). 

The above references are far from exhaustive as the field of JSCC has been quite active, 
particularly over the last 20 years. 

In order to learn more about the performance of the best codes as a function of 
blocklength, much research has focused on the error exponent or reliability function for 
source or channel coding (see, e.g., [13], [19], [23], [31], [37], [52]). Roughly speaking, the 
error exponent E is a number with the property that the probability of decoding error 
of a good code is approximately 2~ En for codes of large blocklength n. Thus the error 
exponent can be used to estimate the trade-off between error probability and blocklength. 
In this paper we use the error exponent as a tool to compare the performance of tandem 
coding and JSCC. While jointly coding the source and channel offers no advantages over 
tandem coding in terms of reliable transmissibility of the source over the channel (for 
the case of memoryless systems as well as the wider class of stationary information stable 
[15, 28] systems), it is possible that the same error performance can be achieved for smaller 
blocklengths via optimal JSCC coding. 

The first quantitative result on error exponents for lossless JSCC was a lower bound on 
the error exponent derived in 1964 by Gallager [23, pp. 534-535]. This result also indicates 
that JSCC can lead to a larger exponent than the tandem coding exponent, the expo- 
nent resulting from separately performing and concatenating optimal source and channel 
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coding. In 1980, Csiszar [17] established a lower bound (based on the random-coding 
channel error exponent) and an upper bound for the JSCC error exponent Ej(Q,W,t) 
of a communication system with transmission rate t source symbols/channel symbol and 
consisting of a discrete memoryless source (DMS) with distribution Q and a discrete mem- 
oryless channel (DMC) with transition distribution W. He showed that the upper bound, 
which is expressed as the minimum of the sum of te(R/t, Q) and E{R, W) over R, i.e., 



where e(R, Q) is the source error exponent [13], [17], [31] and E(R, W) is the channel error 
exponent [17], [23], [31], is tight if the latter minimum is attained for an R strictly larger 
than the critical rate of the channel. Another (looser) upper bound to Ej(Q, W, t) directly 
results from (1) by replacing E(R, W) by the sphere-packing channel error exponent. He 
extended this work in 1982 [18] to obtain a new expurgated lower bound (based on the 
expurgated channel exponent) for the above system under some conditions, and to deal 
with lossy coding relative to a distortion threshold. Our first objective in this work is 
to recast Csiszar's results in a form more suitable for computation and to examine the 
connection between Csiszar's upper and lower bounds, and also the relation between the 
lower bounds of Gallager and Csiszar. After this, we go on to compare the tandem 
coding and joint coding error exponents in order to discover how much potential for 
improvement there is via JSCC. Since error exponents give only asymptotic expressions 
for system performance, our results do not have direct application to the construction 
of good codes. Rather, they point out certain systems for which a search for good joint 
codes might prove fruitful. 

We first investigate the analytical computation of Csiszar's random-coding lower bound 
and sphere-packing upper bound for the JSCC error exponent. By applying Fenchel's 
Duality Theorem [36] regarding the optimization of the sum of two convex functions, 
we provide equivalent expressions for these bounds which involve a maximization over 
a non-negative parameter of the difference between the concave hull of Gallager's chan- 
nel function and Gallager's source function [23]; hence, they can be readily computed 
for arbitrary source-channel pairs by applying Arimoto's algorithm [8]. When the chan- 
nel's distribution is symmetric [23], our bounds admit closed-form parametric expressions. 
We also provide formulas of the rates for which the bounds are attained and establish 
explicit computable conditions in terms of Q and W under which the upper and lower 
bounds coincide; in this case, Ej can be determined exactly. A byproduct of our results 




(1) 



is the observation that Csiszar's JSCC random-coding lower bound can be larger than 
Gallager's earlier lower bound obtained in [23]. Using a similar approach, we obtain the 
equivalent expression of Csiszar's expurgated lower bound [18] and establish the condition 
when the random-coding lower bound can be improved by the expurgated bound. As an 
example, we give closed-form parametric expressions of the improved lower bound and 
the corresponding condition for equidistant DMCs. 

We next employ our results to provide a systematic comparison of the JSCC exponent 
Ej(Q,W,t) and the tandem coding exponent E T (Q,W,t) for a DMS-DMC pair (Q,W) 
with the same transmission rate t. Since Ej > Et in general (as tandem coding is a special 
case of JSCC), we are particularly interested in investigating the situation where Ej > Et- 
Indeed, this inequality, when it holds, provides a theoretical underpinning and justification 
for JSCC design as opposed to the widely used tandem approach, since the former method 
will yield a faster exponential rate of decay for the error probability, which may translate 
into substantial reductions in complexity and delay for real-world communication systems. 
We establish sufficient (computable) conditions for which Ej > Et for any given source- 
channel pair (Q,W), which are satisfied for a large class of memoryless source-channel 
pairs. Furthermore, we show that Ej < 2Et- Numerical examples show that Ej can 
be nearly twice as large as Et for many DMS-DMC pairs. Thus, for the same error 
probability, JSCC would require around half the delay of tandem coding. This potential 
benefit translates into more than 2 dB power gain for binary DMS sent over binary- 
input quantized-output additive white Gaussian noise and memoryless Rayleigh-fading 
channels. 

We also partially address the computation of Csiszar's lower and upper bounds for 
the lossy JSCC exponent with distortion threshold A, Ef(Q, W, t). Under the case of the 
Hamming distortion measure, and for a binary DMS and an arbitrary DMC, we express 
the bounds for Ef(Q,W,t) and the rates for which the bounds are attained as in the 
lossless case. 

The rest of this paper is arranged as follows. In Section 2 we describe the system, define 
the terminologies and introduce some material on convexity and Fenchel duality. Section 3 
is devoted to study the analytical computation of Ej based on Csiszar's work [17], [18]. In 
Section 4, we assess the merits of JSCC by comparing Ej with E T . The computation of 
the lossy JSCC exponent is partially studied in Section 5. Finally, we state our conclusions 
in Section 6. 



2 Definitions and System Description 



2.1 System 

We consider throughout this paper a communication system consisting of a DMS {Q : S} 
with finite alphabet S and distribution Q, and a DMC {W : X — > 3^} with finite input 
alphabet X, finite output alphabet y, and transition probability W = Py\x- Without 
loss of generality we assume that Q(s) > for each s e S. Also, if the source distribution 
is uniform, optimal (lossless) JSCC amounts to optimal channel coding which is already 
well-studied. Therefore, we assume throughout that Q is not the uniform distribution on 
S except in Section 5 where we deal with JSCC under a fidelity criterion. 

A joint source-channel (JSC) code with blocklength n and transmission rate t > 
(measured in source symbols/channel use) is a pair of mappings /„ : S tn — ► X n and 
V^n : y n — ► S tn . That is, blocks s tn = (s 1: s 2 , s tn ) of source symbols of length tn 
are encoded as blocks x n = (x± : x 2 , x n ) = f n (s tn ) of symbols from X of length n, 
transmitted, received as blocks y n = (yi, 1/2, y n ) of symbols from y of length n and 
decoded as blocks of source symbols ¥n{y n ) of length tn. The probability of erroneously 
decoding the block is 

P^(Q,W,t) 4 Qtn(s tn )Pn,Y\X {vVn^)) ■ 

Here, Q tn and P n y\x are the tn- and n-dimensional product distributions corresponding 
to Q and P Y \x respectively. 

Throughout the paper, log will denote a base 2 logarithm, |«S| will mean the number 
of elements in S and similarly for the other alphabets, C will denote the capacity of the 
DMC given by 

C = max J(P X ; WO, 

Px 

where I(Px] W) is the mutual information between the channel input and the channel 
output [23]. Finally, if (•) will denote the entropy of a discrete probability distribution. 

2.2 Error Exponents 

Definition 1 The JSCC error exponent Ej{Q,W,t) is defined as the largest number 
E for which there exists a sequence of JSC codes (/„, (f n ) with transmission rate t and 
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blocklength n such that 

E < liminf--logP e (n) (<5,W / ",t). 



n^oo n 



When there is no possibility of confusion, Ej(Q,W,t) will be written as Ej. We know 
from the JSCC theorem (e.g., [16, p. 216], [23]) that Ej can be positive if and only if 
tH(Q) < C. 

For future use, we recall the source and channel functions used by Gallager [23] in 
his treatment of the JSCC theorem. We also introduce some useful notation and some 
elementary relations among these functions. Let Gallager's source function be 

E s (p,Q)^(l + p)logJ2Q(s)^, p>0. (2) 

ses 

Let 

E (p,p x ,w)±-iogj2(Y, p xW p M(y\ x H ' ( 3 ) 

y&y \xeX J 

and 

e x (p;P x ,w) 4 -piog^ Pxtx)Pxtf) u2\l p y\^y I x ) p y\x(y I *0 , p>i- 

xeXx'ex \yey J 

(4) 

Px in (3) and (4) is an unspecified probability distribution on X . Connected with these 
functions are the source error exponent, 

e(R,Q)= sup [pR-E s (p,Q)}, (5) 

0<p<oo 

and three intermediate channel error exponents 

E r (R, P x , W) 4 max [E (p, P x , W) - pR], (6) 

0<p<l 



and 



E ex (R, P x , W) 4 sup[E x (p, P x , W) - pR], (7) 



E sp (R, P x , W) 4 sup [E (p, P x , W) - pR}. (8) 

0<p<oo 



From these, we can form the random-coding lower bound for the channel error expo- 
nent E(R, W), 

E r (R,W) = maxE r (R,P x ,W), (9) 

Px 
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the expurgated lower bound 



E ex (R,W) ±m a xE ex (R,P x ,W), (10) 

Px 

and the sphere-packing upper bound 

E sp (R,W) ± m&xE sp (R,P x ,W). (11) 

Px 

In other words, max{E r {R, W), E ex (R, W)} < E(R, W) < E sp (R, W). Also, we can form 
Gallager's channel functions 

E (p, W) 4 max E Q (p, P x , W) (12) 

Px 

and 

E x (p, W) 4 max E x (p, P x , W). (13) 

It should be noted that maximization over P x means maximization over the closed 
bounded set {(pi, ■ ■ ■ ,p\x\) '■ p% > 0, YlPi = !}• Thus, if the function involved is con- 
tinuous, the maximum is achieved for some distribution P x . 

The functions E r (R, P x , W) and E sp (R, P Xl W) in (6) and (8) are equal if the maxi- 
mizing p < 1 in (8) or equivalently, if it! > R cr (P x , W), where R cr (P x , W) is the critical 
rate of the channel W under distribution P x , defined by 



dp 



(14) 

P =i 



For all P x , E r (R,P x ,W) and E sp (R, P x , W) vanish for all R> C. Consequently, their 
maxima over P X) E r {R,W) and E sp (R, W), vanish for R > C and are equal on some 
interval [R cr (W), C] where R cr (W) is the critical rate of the channel and is defined by 

R cr (W) = M{R : E r (R, W) = E sp (R, W)}. (15) 

Furthermore, it is known that E sp (R, W) meets E r (R, W) on its supporting line of slope 
— 1 [19, p. 171], which means that E r (R,W) is a straight line with slope —1 for R < 
Rcr(W) and hence 

E r (R, W) = E (l, W) - R, R<R cr (W). (16) 

For all P x , the function E ex (R, P x , W) is a decreasing convex curve with a straight- 
line section of slope -1 for R > R ex {P x ,W), and E ex {R,P x ,W) > E r {R,P x ,W) for 
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R < R ex (P x ,W), where R ex (P x , W) is the "expurgated" rate of the channel W under 
distribution P x , defined by 



dp 



(17) 

P =i 



Since the above are satisfied for all P x , we then obtain the following relation between the 
two lower bounds: E r (R, W) < E ex (R, W) for R < R ex (W) and E r (R, W) > E ex (R, W) 
otherwise, where 

R ex (W) 4 M{R : E r (R, W) = E ex (R, W)} (18) 

is the expurgated rate of the channel. Furthermore, it is known that E ex (R,W) and 
E r (R, W) meet their supporting line of slope —1 (according to the fact that Eq(1, W) = 
E X (1,W)) [23, p. 154]. This geometric relation implies that R ex (W) < R cr {W) and 
E r (R, W) = E ex (R, W) is a straight line in the region [R ex (W), R cr (W)}. 

We remark that Csiszar [17] defines e(R, Q), E r (R, P x , W), and E sp (R, P x , W) using 
expressions involving constrained minima of Kullback-Leibler divergences. He also defines 
E ex (R, P x , W) in terms of the Bhattacharya distance and the mutual information between 
two channel inputs. Our expressions are equivalent, as can be shown by the Lagrange 
multiplier method; see also [19, pp. 192-193] and [13]. 

2.3 Tilted Distributions 

We associate with the source distribution Q a family of tilted distributions defined 
by ^ 

Q(P)( 8 )A gMf) ; seS ^ p > . (19) 

J2 s 'esQ~ p ( s> ) 

Lemma 1 [19, p. 44] The entropy H(Q^) is a strictly increasing function of p except in 
the case that Q(s) = l/\S\ for all s e S . Moreover, for H(Q) < R < log |«S|, the equation 
if(QM) = R i s satisfied by a unique value p* (where we define p* = oo if R = log |<S| and 
define H(QM) = \og\S\). 

The proof that H(Q^) is increasing follows easily from differentiation with respect to p 
and a use of the Cauchy-Schwarz inequality. The remainder of the proof follows from the 
facts that H(Q<®) = H(Q), \vai p _ 00 H{Q^) = \og\S\ and that H(Q {p) ) is a continuous 
function of p. 



in 



It is easily seen that 

H{Qip)) = d_EM® m 

where E s (p, Q) is defined by (2). From this we see that for R > H{Q) the supremum in 
(5) is achieved at p*. 



2.4 Fenchel Duality 

Although many of our results can be obtained by the use of the Lagrange multiplier 
method, the Fenchel Duality Theorem gives more succinct proofs and seems particularly 
well-adapted to the elucidation of the connection between error exponents on the one 
hand, and source and channel functions on the other. 1 We present here a simplified one- 
dimensional version which is adequate for our purposes. For more detailed discussion, the 
reader may consult [36, pp. 190-202], [12, Chapter 7], or [42]. 

For any function / defined on F C I, define its convex Fenchel transform (conjugate 
function, Legendre transform) /* by 

f*(y) = sup[xy - f(x)} 

and let F* be the set {y : f*(y) < oo}. It is easy to see from its definition that /* is a 
convex function on F*. Moreover, if / is convex and continuous, then (/*)* = /. More 
generally, /** < / and /** is the convex hull of /, i.e. the largest convex function that is 
bounded above by / [42, Section 3], [12, Section 7.1]. 

Similarly, for any function g defined on G C R, define its concave Fenchel transform 
9* by 

9*{v) - ™£fo/-ff(aO] 

and let G* be the set {y : g*(y) > — oo}. It is easy to see from its definition that g* is a 
concave function on G*. Moreover, if g is concave and continuous, then (#*)* = g. More 
generally, > g and g^ is the concave hull of g, i.e. the smallest concave function that 
is bounded below by g. 

Fenchel Duality Theorem [36, p. 201] Assume that / and g are, respectively, convex 

and concave functions on the non-empty intervals F and G in R and assume that F n G 

1 Another related application of Fenchel duality is carried out in [5] in the context of guessing subject 
to distortion, where it is shown that the guessing exponent is the Fenchel transform of the error exponent 
for source coding with a fidelity criterion. 

1 1 



has interior points. Suppose further that p = inf^^ n a[f(x) — 9( x )] is finite. Then 



p = jnfjf(x) - g(x)} = ^ ™&\9*(V) - /*(</)], (21) 

where the maximum on the right is achieved by some yo G F* n G*. If the infimum on 
the left is achieved by some x e F fl G, then 

max[xyo - /(a:)] = x y - /(ar ) (22) 

and 

mm[xy - 5-(^)] = xoVo ~ g(x ). (23) 

x&G 

2.5 Properties of the Source and Channel Functions 

Lemma 2 The source function E s (p, Q) defined by (2) is a strictly convex function of p. 

Convexity follows directly from (20) and Lemma 1. Strict convexity is a consequence of 
our general assumption that Q is not the uniform distribution. It will be seen from (5) 
that e(R, Q) is the convex Fenchel transform of E s (p, Q). In fact, it is easily checked that 
(e.g., cf. [19, pp. 44-45]) 

10 if R<H(Q), 

D(Q^\\Q) if H(Q) <R<\og \S\ , (24) 
oo if R > log |«S| , 

where -D(-||-) denotes the Kullback-Leibler divergence and p* is the solution of H(Q^) = 
R. Note that (24) implies that e(R, Q) is strictly convex in R on [H(Q), log |«S|] when the 
source is nonuniform; otherwise H(Q) = log |«S|. 

The relation between the Gallager's channel function Eo(p, W) and the random-coding 
and sphere-packing bounds is more complicated. First of all, recall that for each Px, 
E r (R, Px,W) as defined in (6) is a convex non-increasing function for all R, and is a 
linear function of R with slope —1 for R < R cr (Px, W) [23, p. 143 ]. It will be convenient 
to regard this linear function as defining E r (R, Px, W) for all negative R. The random 
coding bound E r (R, W), which is the maximum of this family of convex functions, is a 
convex strictly decreasing function of R for R < C, and is a linear function of R with 
slope —1 for all R below the critical rate R cr (W). For R > C, E r (R,W) = 0. Since 
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E r (R, W) is convex, then —E r (R, W) is concave. Let T r (p, W) be the concave transform 
of -E r (R,W), i.e. 

T r (p, W) 4 mi [pR + E r (R, W)]. (25) 

It follows from the properties of E r (R,W) noted above that T r (p,W) = — oo for p < 
and p > 1 and that T r (p, W) is finite for p G [0, 1]. 

Lemma 3 The function T r (p, W) defined by (25) is the concave hull on the interval 
[0,1] of the channel function E (p,W) defined in (12). Thus, E {p,W) < T r (p,W) for 
< p < 1. 

Proof: We form the concave transform of E (R, W) on the interval [0, 1] to get 
(E (p, W)) m = inf [ P R - Eo(j>, W)\ = - sup [E (p, W) - pR]. 

0<P<! 0<p<l 

Now use, in succession, (12), (6), and (9) to get 

(E (p,W))« = - sup n^[E Q {p,P x ,W)-pR\ 

0<p<l p x 

= — max sup [E (p, P x , W) — pR] 

p x 0<p<l 

= -maxE r {R,P x ,W) 

Px 

= -E r (R,W). 

Since T r (p,W) is the concave transform of the concave function, —E r (R,W), we have 
that 

(-E r (R, W)\ = T r (p, W) and so (E (p, W))„ = T r (p, W). 
Hence, T r (p, W) is the concave hull on [0, 1] of E (p, R). ■ 

Similarly to the above, recall that E sp (R, W), defined in (11) is convex, zero for R> C, 
positive for R< C, and finite if R > RodW) [19], [23], where RodW) is given by 

R^W) 4 lim «4eJQ. (26) 

p^oo p 

A computable expression for R^iW) is given in [23, p. 158]. The normal situation is 
Roc(W) = 0. (As shown by Gallager, R^W) = unless each channel output symbol 
is unreachable from at least one input. In the latter case, Roo(W) > 0.) We now let 
T sp (p, W) be the concave transform of the concave function —E sp (R, W), i.e. 

T sp (p, W) ^ inf [pR + E sp (R, W)}. (27) 

Roo(VK)<i?<oo 

It follows that T sp (p, W) = -oo for p < and that < T sp (p, W) < oo for p > 0. 
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Lemma 4 The function T sp (p, W) denned by (27) is the concave hull on [0, oo) of the 
channel function E (p, W) defined in (12). 

Proof: We now form the concave transform of E (p, W) on the interval [0, oo) to get 
(E (p,W)\= inf [ p R-E (p,W)] = - sup [E (p,W) - pR\. 

0<p<oo 0<p<oo 

Now use (12), (8), and (11) to get 

{EofaW)), = - sup max[Eo(p,P x ,W)-pR] 

0<p<oo p x 

= —max sup [E (p, P x , W) — pR] 

p x 0<p<oo 

= -maxE sp (R,P x ,W) 
= -E sp (R,W). 

As in the previous proof, (E (p, W))^ = T sp (p, W). Hence, T sp (p, W) is the concave hull 
on [0,oo) of E (p,R). ■ 



Observation 1 Note that the function E (p, P x , W) is concave in p for each P x [23, p. 
142]. Hence, if the maximizing P x in (12) is independent of p, E (p,W) is concave and 
thus T r (p, W) and T sp (p, W) are equal to E (p,W). This situation holds if the channel 
is symmetric in the sense of Gallager [23, p. 94] (also see Example 2). For this case, 
the maximizing distribution is the uniform distribution P x (x) = for all a; G X. 

However, there are channels for which Eq(p,W) is not concave. One example of such a 
channel is provided by Gallager [23, Fig. 5.6.5]. For this particular (6-ary input, 4-ary 
output) channel, we plot E (p, W) against p in Fig. 1. It is noted that the derivative of 
E (p,W) has a positive jump increase at around p = 0.51 (see [23, Fig. 5.6.5]), and its 
concave hull T r (p, W) is strictly larger than E (p, W) in the interval p E (0.41, 0.62). 

3 Bounds on the JSCC Error Exponent 

3.1 Csiszar's Random-Coding and Sphere-Packing Bounds 

Csiszar [17] proved that for a DMS and a DMC the JSCC error exponent in Definition 1 
satisfies 

E r (Q,W,t) <Ej(Q,W,t) <E sp (Q,W,t), (28) 
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where 

E r (Q,W,t) = min 

tH(Q)<R<t\og\S\ 



te[j,Q) +E r (R,W) 



(29) 



and 

_ r / r \ 

(30) 



E sp (Q,W,t)± inf 

tH(Q)<R<t\og\S\ 



te (j,Q^j+E sp (R,W) 



are called the source-channel random-coding lower bound and the source-channel sphere- 
packing upper bound, since they respectively contain E r (R,W) and E sp (R, W) in their 
expressions. These bounds can be expressed in a form more adapted to calculation as 
follows. 

Theorem 1 Let tH(Q) < C and let t log |«S| > Roo(W). Then 

E r (Q, W, t) = max [T r (p, W) - tE s (p, Q)] (31) 

0<p<l 

and 

E sp (Q, W, t) = max [T sp (p, W) - tE s (p, Q)\ (32) 

0<p<oo 

where T r (p, W) and T sp (p, W) are the concave hulls of E (p, W) on [0, 1] and [0, oo) defined 
in (25) and (27), respectively. If the maximizing P x in (12) is independent of p, T r (p, W) 
and T sp (p, W) can be replaced by E (p, W). 

Remark 1 When tH(Q) > C, E r (Q, W, t) = E sp (Q, W, t) = 0. 

Observation 2 According to Lemma 3, Eq(p,W) < T r (p,W). Thus the lower bound 
E_ r (Q, W, t) can be replaced by the possibly looser lower bound 2 

m^[E (p,W)-tE s (p,Q)\. (33) 

0<p<l 

This is the lower bound implied by Gallager's work [23, p. 535]. As noted earlier, if the 
maximizing Px in (12) is independent of p (e.g., for symmetric channels, see Example 2), 
the two lower bounds are identical. 



2 In [56], [55], we incorrectly stated that Csiszar's random-coding lower bound E. r (Q,W, t) given in 
(29) and Gallager's lower bound given in (33) arc identical. This is indeed not always true; it is true if 
Eq(p,W) is a concave function of p (e.g., for symmetric channels) or tH(Q^) < R cr (W) (sec Corollary 
3). Thus, although both lower bounds are "random coding" type bounds, Csiszar's bound is in general 
tighter. 
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Proof of Theorem 1: We first apply Fenchel's Duality Theorem (21) to the lower bound 
E_ r {Qi W, t). From Lemma 2, (5), and (24), te(R/t, Q) is convex on (— oo, t log |«S|] and has 
convex transform tE s (p, Q) on the set [0, oo). Also, from the discussion preceding Lemma 
3, —E r (R, W) is concave on R and has concave transform T r (p, W) which is bounded on 
[0, 1]. Thus, by Fenchel's Duality Theorem, 

= m^[T r (p,W)-tE a (p,Q)]. (34) 

0<p<l 

Now the convex function te(R/t,Q) + E r (R,W) is non-increasing for R < tH(Q) since 
te(R/t,Q) = in this region. This implies that the infimum on the left side of (34) 
can be restricted to the interval tH(Q) < R < £log|«S|. Since this is now the infimum 
of a continuous function on a finite interval this will be a minimum. Hence, (31) is an 
equivalent representation of E_ r (Q, W,t). 

Similarly, for the upper bound, recall from the discussion preceding Lemma 4 that 
—E sp (R, W) is concave and finite for R > R OD (W) and has a concave transform T sp (p, W), 
which is finite on < p < oo. Thus, by Fenchel's Duality Theorem, 

= max [T sp (p, W) - tE s (p, Q)]. (35) 

0<p<oo 

The assumption R^W) < t\og \S\ ensures that the infimum on the left of (35) is taken 
over a set with interior points. If RodW) < tH(Q), the infimum can be replaced by a 
minimum on the interval tH(Q) < R < t log | <S| by the same argument as for the lower 
bound. If RooiW) > tH(Q), we no longer form the infimum of a continuous function, but 
it can still be shown that there is a minimum point which lies in the interval tH(Q) < 
R < t log | «S | - Hence, (35) is an equivalent representation of E sp (Q, W, t). ■ 

Observation 3 The parametric form of the lower and upper bounds (31) and (32) in- 
deed facilitates the computation of Csiszar's bounds. In order to compute the bounds for 
general non-symmetric channels (when tH(Q) < C and t log |«S| > Roo), one could employ 
Arimoto's algorithm [8] to find the maximizing distribution and thus Eq(p, W). We then 
can immediately obtain the concave hulls of Eq(p,W), T r (p,W) and T sp (p,W), numeri- 
cally (e.g., using Matlab) and thus the maxima of T r (p, W) — tE s (p, Q) and T sp (p, W) — 
tE s (p, Q). This significantly reduces the computational complexity since to compute (29) 
and (30), we need to first compute E r (R,W) and E sp (R, W) for each R, which requires 
almost the same complexity as above, and then we need to find the minima by searching 
over all i?'s. For symmetric channels, (31) and (32) are analytically solved; see Example 2. 

Ifi 



inf 

-oo<R<t log \S\ 



te[^,Q) hE r {R,W) 



inf 

R x (W)<R<tlog\S\ 



te (j,Qj +E sp (R,W) 



Example 1 Consider a communication system with a binary DMS with distribution 
Q = {q, 1 — q} and a DMC with \X\ = 6, \y\ = 4, and transition probability matrix 



W = 



1 - 18e 


6e 


6£ 


6e 


6e 


1 - 18e 


6£ 




6e 


6e 


1 - 18e 


6£ 


Qe 


6e 




1 - 18e 


0.5 -e 


0.5 -e 


£ 


£ 


e 


£ 


0.5 -e 


0.5 -e 



We then compute Csiszar's random-coding and sphere-packing bounds, E_ r (Q,W,t) 
and E sp (Q, W,t). For fixed Q and transmission rate t, we plot these bounds in terms of 
e in Fig. 2. Our numerical results show that Ej could be determined exactly for a large 
class of (q, e, t) triplets: when source Q = {0.1,0.9} and rate t = 0.75, Ej is exactly 
known for e > 0.0025; when Q = {0.1,0.9} and t — 1, Ej is known for e > 0.002; and 
when Q = {0.2,0.8} and t = 1.25, Ej is known for e > 0.001. Since for this channel 
E (p,W) might not be concave (e.g., when e = 0.01, W reduces to the DMC discussed 
in Observation 1 at the end of Section 2), our results indicate that Csiszar's lower bound 
is slightly but strictly larger (by 0.0001) than Gallager's lower bound (33) for q — 0.1, 
t = 1, and e around 0.02. This is illustrated in Fig. 3. 



3.2 When Does E r (Q, W, t) = E sp (Q, W, t) ? 

One important objective in investigating the bounds for the JSCC error exponent Ej is to 
ascertain when the bounds are tight so that the exact value of Ej is obtained. According 
to Csiszar's result (28), we note that if the minimum in the expressions of E_ r (Q, W,t) or 
E sp (Q, W,t) is attained for a rate (strictly) larger than the critical rate R cr (W), then the 
two bounds coincide and thus Ej is determined exactly. This raises the following question: 
how can we check whether the minimum in E_ r (Q, W,t) or E sp (Q, W,t) is attained for a 
rate larger than R cr {W)l One may indeed wonder if there exist explicit conditions for 
which E_ r (Q,W : t) = E sp (Q,W : t). The answer is affirmative; furthermore, we can verify 
whether the two bounds are tight in two ways: one is to compare tH(Q^) with R cr (W), 
and the other is to compare the minimizer of E sp (Q, W, t) in (32), p* say, with I. Before we 
present these conditions, we first define the following quantities which achieve the bounds 
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E r (Q, W,t) and E sp (Q, W,t) under the assumptions tH(Q) < C and tlog |«S| > R^: 



te (j,q) +E r (R,W) 
te (^,0) +E sp (R,W) 



(36) 
(37) 



R m = arg min 

tH(Q)<R<t\og\S\ 

R m = arg min 

tH(Q)<R<t\og\S\ I \t 

p* 4 sagmax[T r (p,W)-tE s (p,Q)}, (38) 

— 0<p<l 

p* 4 arg max [T sp (p, W) -*£ s (p,Q)]. (39) 

0<p<oo 

Since the functions between brackets to be minimized (or maximized) in (36)-(39) are 
strictly convex (or concave) functions of R (or p), R m , R m , p* and p* are well-defined and 
unique. We then have the following relations. 

Lemma 5 Let tH(Q) < C and let t log \S \ > R^W). Then: 

(1) . p* and p* are positive and finite. 

(2) . R m = tH(QV)). 

(3) . R m = tH(QW) if p* <l;R m > tH(QW) if p* = 1. 

Proof: We first prove (1). Since T sp (p, W) is the concave hull of E (p, W), we have the 
following relation 

lim T, r ( P ,W) >iim E a (p,W) =c 
pio p plO p 

where the last equality follows from [7, Lemma 2]. Since lim p ^ E s (p, Q) / p = H{Q) by 
(20) and Lemma 1, we have 

hm WV)'tE,(p,Q) >c _ 

PlO p 

Note that the right- derivative of T sp (p, W) (at p — 0) must exist due to its concavity [43, 
pp. 113-114], and hence lim p j T sp (p, W)/p exists. Next we denote e — t log |«S| — R^iW) > 
0. It follows from the definition of T sp (p, W) that 

lim WO < lim tiR~(W) + e/2) + E, r (R^W) + e/2,W) = + 

p^oo p p^oo p 

because of the finiteness of E sp (R, W) for R > R oa (W). This together with lim^oo E s (p, Q)/p 
log | S | implies 

lim WV)~tE, M ) < + _ t 

p^oo p 
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Since T sp (p, W) — tE s (p,Q) is and has a positive right-slope at p = and is negative 
for p sufficiently large, by the strict concavity of T sp (p, W) — tE s (p, Q), the maximum in 
(39) must be achieved by a positive finite p*. The positivity of p* can be shown in the 
same way and p* is finite by its definition. 

We next prove (2). If we now regard te(R/t,Q) as f*(y) and tE s {p,Q) as f(x) (by 
noting that /** = /), then according to (22) in Fenchel's Duality Theorem, 

max [pR m - tE s (p, Q)\ = p*R m - tE s (p*, Q). 

0<p<oo 

Setting the derivative of pR rn — tE s (p, Q) equal to 0, we can solve for the stationary point 3 
p*, which gives R m = tH(Q&">). 

For the lower bound, using a similar argument, we obtain the relation 

max [pR m - tE s (p, Q)\ = p*R m - tE s (p\ Q). 

0</9<l — — 

Recalling that the function between the brackets to be maximized is strictly concave, if 
the above maximum is achieved by p* G (0, 1), then we can solve for the stationary point 
as above and obtain R m = tH(Q^). If the maximum is achieved at p* = 1, then the 
stationary point is beyond (at least equal to) 1, and hence R m > tH(Q^). Thus (3) 
follows. ■ 

In order to summarize the explicit conditions for the calculation of Ej it is convenient 
to define a critical rate for the source by 



nW fnl a dE s (p,Q) 



= H(Q^), (40) 

p=l 



recalling that Q«(s) = y/Q{s)/(J2s'es VQ( S ')), 3 e S. 
Theorem 2 Let tH(Q) < C and let Uog|<S| > Roo{W). Then 

• tR$(Q) > R cr {W) p* < 1 tR${Q) >Rm = R m > Rcr{W). In this case, 

Ej(Q, W, t) = T sp (p*, W) - tE s (p*, Q). 

• tB&\Q) < RcriW) p* > 1 R cr (W) >R m >R m = tR&\Q). In this case, 

E (l,W)-tE s (l,Q) < Ej(Q,W,t) < T sp (p*,W)-tE s (p*,Q). 



3 The stationary points of a differentiable function f(x) are the solutions of f'(x) = 0. 
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Remark 2 Under the condition tn£'(Q) > R cr (W), p* = 1 is possible. However, if 
tR&\Q) = R cr (W), then we definitely have p* = 1 and tR$(Q) = R m =R m = R cr {W). 

Remark 3 It can be shown that T sp (l, W) = E (l, W) and thus when p* = 1, the JSCC 
exponent is determined by 

Ej(Q,W,t) = Eo(l,W)-tE a (l,Q). 

Corollary 1 Let tH(Q) < C and let tlog|S| > Roc(W). Then p* = min{l,p*} and 
R n = tH(Q^). 

The proof of Theorem 2 involves a geometric argument involving the left- and right- 
slopes of the convex functions E r (R, W) and E sp (R, W) and is deferred to Appendix A. 
Corollary 1 could be regarded as a complement of Lemma 5 (3) and it is also proved in 
Appendix A. 

Corollary 2 If R m > R cr (W) or R m > R cr (W), then tR$(Q) > R m = R m > R cr (W), 
and the other equivalent conditions in Theorem 2 hold. 

Proof: If R m > R cr (W) or R m > R cr (W), then R m = R m by Lemma 9 in Appendix A. 
tR^riQ) > R m immediately follows from Corollary 1. ■ 

Remark 4 Corollary 2 states that if R m > R cr (W) or R m > R cr (W), then Ej is deter- 
mined exactly. Note that when R m = R cr (W), the upper and lower bounds of Ej may 
not be tight. In that case R rn < R cr {W) = R m is possible. The relation between R rn and 
R rn is summarized in Lemma 9 in Appendix A. 

We point out that, in both the computation and analysis aspects, the above conditions 
play an important role in verifying whether Ej can be determined exactly or not. For the 
class of symmetric DMCs, we can use the conditions tR^J(Q) > R cr (W) and tR^(Q) < 
Rcr(W) to derive explicit formulas for Ej, see Example 2. In Section 4, we apply Theorem 
2 to establish the conditions for which the JSCC exponent is larger than the tandem coding 
exponent. Note that when tR$(Q) < R cr (W), the source-channel random-coding bound 
admits a simple expression 

E r (Q, W, t) = E (l, W) - tE s (l, Q). (41) 

Consequently, we have the following statement. 
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Corollary 3 If tR& '* (Q) < Rcr(W), then Csiszar's random-coding bound and Gallager's 
lower bound (33) are identical. 

Proof: Recall Gallager's lower bound to Ej given by (33) 

max [E (p, W) - tE s (p, Q)] > E (l, W) - tE s (l, Q). 

0<p<l 

Since in general Gallager's lower bound cannot be larger than Csiszar's random-coding 
bound, they must be equal when tR^(Q) < R cr (W). ■ 



Example 2 (DMS and Symmetric DMC) Consider a DMS {Q : S} and a symmetric 4 
DMC {W : X — > y} with rate t, where the channel transition matrix W can be partitioned 
along its columns into sub- matrices W\, Wi, • • • , W s , such that in each W{ with size \X\y. 
\yi\, each row is a permutation of each other row and each column is a permutation of 
each other column. Denote the transition probabilities in any column of sub-matrix Wi, 
i = 1, 2, • • • , s, by {pn,Pi2, ■■■,Pi\x\}- Then both E (p, W) and the channel capacity are 
achieved by the uniform distribution Px = ^/\X\ and have the form 



i=i 



E (p,w) = (i+ P )\og\x\-\og\j2\y*\ E^ p | } ( 42 ) 

and 




C = \og\X\ 



1 1 i=l \j=l ) 



where the tilted distribution P- a \ a > 0, for each i = 1,2, ••• ,s, is defined on I x = 
{1,2,-.., |^|} by 

1 

Since now Eo(p, W) is a concave and differentiable function of p, the bounds E_ r (Q, W,t) 
and E sp (Q, W, t) can be analytically obtained. If 

j^El^l ( E^j H(PP)+tH(Q) <log\X\ (43) 



4 Here symmetry is defined in the Gallager sense [23, p. 94]; it is a generalization of the standard notion 
of symmetry [16] (which corresponds to s = 1 above). 
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and 



e:=iI^(eSv^) 2 ^ (1) ) 

^ g + tH(QW) > \og\X\, 



EI =1 l^|(El='iv^)' 
then the source-channel exponent is positive and is exactly determined by 



(44) 



Ej(Q,W,t) = (l+7f)log|*|-log 



\x\ 



1=1 



t(l+p*) 



(45) 



where p* is the unique root of the equation 

i+p 



Ei=i \y%\ ( EjJiftj 



Ej=l ( EjJlPij P 



i7(^ (p) ) 

— + fif(QM)=log|*|. 



(46) 



In the case when (43) does not hold, which means tH(Q) > C, Ej(Q,W,t) = 0. When 
(43) holds but (44) does not hold, the right-hand side of (45) becomes the upper bound 
E sp {Q, W, t) and meanwhile, Ej is lower bounded by E (l, W)—tE s (l, Q), where E (p, W) 
is given by (42). 

Now we apply the conditions (43) and (44) to a communication system with a binary 
source with distribution {q, 1 — q}, a binary symmetric channel (BSC) with crossover 
probability e and transmission rates t =0.5, 0.75, 1, and 1.25. Note that 



and 



R cr (W) = l-h b 



R ( iXQ) = h b 



where hb(-) is the binary entropy function. In Fig. 4, we partition the set of possible 

points for the (e, q) pairs into three regions: A, B and C. If (e, q) E B, where conditions 

(43) and (44) hold, i.e., tH(Q) < C and tR$(Q) > R cr (W), then the corresponding Ej is 

positive and exactly known. 5 Furthermore, if (e, q) e C, then Ej is bounded above (below, 

5 In light of the recent work in [11], where the random coding exponent E r (R, W) of the BSC is shown 
to be indeed the true value of the channel error exponent E(R, W) for code rates R in some interval 
directly below the channel critical rate (in other words, it is shown that for the BSC with its e above a 
certain threshold, E r (R,W) = E(R,W) for Ri < R < C where R\ can be less than R cr (W) [11]), we 
note via (1) and the lower bound in (28)-(29) that region B where Ej is exactly known can be enlarged. 



respectively) by the right-hand side of (45) (E (1,W) — tE s (l,Q), respectively). When 
(e, q) G A, where tH(Q) > C, Ej is zero, and the error probability of this communication 
system converges to 1 for n sufficiently large. So we are only interested in the cases when 
(e,q) G BUC. 



3.3 Csiszar's Expurgated Lower Bound 

In [18], Csiszar extended his work and obtained another lower bound to Ej for a class of 
source-channel pairs: for a DMS and a DMC with zero-error capacity equal to 0, 

Ej(Q,W,t)>E ex (Q,W,t) (47) 

if E ex (R, W) = maxp x E ex (R, P x , W) is attained for a P x not depending on R, where 

(48) 



E ex (Q,W,t)^ min 

tH(Q)<R<tlog\S\ 



te[j,Q)+E ex (R,W) 



is called the source-channel expurgated lower bound since it contains E ex (R,W) in its 
expression. We then use Fenchel's Duality Theorem to derive an equivalent expression of 
E^ x (R,W,t). 

Theorem 3 For a DMS and a DMC with zero-error capacity equal to 0, if E ex (R, W) = 
maxp x E ex (R, Px, W) is attained for a Px not depending on R, then 



E^{Q,W,t) = sup[E x (p,W)-tE s (p,Q)}. 



(49) 



Proof: Recall that E x (p, P x , W) is concave in p on the interval G = [1, +oo) [23, pp. 153- 
154]. Note that 

-E ex (R, P X ,W)±- sup[£ x (p, P x , W) - pR] = inf [pR - E x (p; P x , W)] 

is the concave transform of E x (p, P x , W) on R G G* = {R : -E ex (R, P x , W) > -oo} = 
[0, +oo) for DMCs with zero-error capacity equal to 0. Also recall that tE s (p, Q) is strictly 
convex in p on the interval F = [0, +oo). Its convex transform 



sup[pR-tE s (p,Q)]=te(^,Q) 



is a function of R on F* = {R : te(R/t,Q) < +00} = (— 00, t log |«S|]. Fenchel's Duality 
Theorem states that 



MJtEMQ)-E x ( P ,Px,W)]= R ^ 



-E ex (R,P x ,W)-te[j,Q 
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or 



sup[E x (p, P X ,W)- tE s {p, Q)\ = min 

p>l 0<K<ilog|5| 



t< [ j.Q) a; ,.(/?. /\.u 



We can now maximize over Px and get the two equivalent lower bounds: 



sup[E x (p, W) — tEJp, Q)\ = max min 

P >1 - p x Q <R < t \ og \S\ 



te[j,Q)+E ex (R,Px,W) 



(a) 



(b) 



mm 

0<i?<tlog|5| 



h ( j,Q \ +maxi; ,.(/?. f\.ir) 



mm 

tH(Q)<R<t log |5| 



/, (y.Q) /. /.Ml 



where (a) follows by assumption that the maximizing P x does not depend on R and 
(b) holds since the convex function te(R/t,Q) + E ex (R,W) is either infinity or strictly 
decreasing for R < tH(Q). ■ 

In the following lemma we note that the supremum in (49) can be replaced by a 
maximum, and the relation between the maximizer p and its dual minimizer R^ is 

' ——X — 

given. 

Lemma 6 For DMC with zero-error capacity equal to 0, the function E x (p, W)—tE s (p, Q) 
has a global maximum at a finite p > 1. Let 



p = argmax^p, W) - tE s (p, Q)] 
p>i 



and 



arg mm 

tH(Q)<R<tlog\S\ 



Then = tH(Q<0) if Pt > 1; R xm < tR { c s r \Q) if p = 1. 



(«)/ 



(50) 



(51) 



Remark 5 Since the function between brackets to be optimized in (50) (or (51)) is strictly 
concave (or convex), p and R xm are well-defined and unique. 



Proof: We first show that p^ is finite. Recall that for any Px, Gallager's source and 
channel functions E s (p, Q) and E x (p; Px, W) given in (4) at p = 1 reduce to 

2 



E s (l,Q)=\og ^gv^j 



and 

E X (1;P X ,W) = - log J] (Y,Px(x)y/P Y \x(y\x) 

v&y \xex 

Using Jensen's inequality [16] on the convex function x 2 , we obtain 

E.(1,Q) < logJ^iQ^Qis)- 1 ) = \og\S\ 

ses 

with equality if and only if Q is uniform, and 

e x (i;p x ,w) > - log Y, p ^ x ) p y\^y\ x ) = - 

Therefore, 

E x (l,W)-tE s (l,Q)>-\og\S\ 

because of the nonuniform source assumption. On the other hand, because the zero-error 
capacity is we know that lim^oo Ex ^ w ^ = (from [23, p. 155]) and hence 

p->oo p 

Clearly, since the concave function E x (p,W) — tE s (p,Q) is finite (bounded below) at 
p = 1, and approaches to — oo as p — > oo, there exists a global maximum at a finite p^. 
We next show the relation between p^ and R xm . Following the proof of Theorem 3, let 
f*(y) be te(R/t,Q) and let f(x) be E 3 (p,Q). Fenchel's Duality Theorem (22) says that 
p^ and R xm should satisfy 

m ax[p^ m - tE s (p, Q)\ = p^R xm - tE s {p, Q). 

If p x > 1, then p^ is the stationary point of the concave function pR xm — tE s (p, Q), and 
hence 

Rxm-mQ^)- 

Otherwise (if p =1), which means that the stationary point is less than or equal to 1, 

R xm <tR { c s r \Q). 

■ 

Analogously to Theorem 2, we have the following explicit conditions regarding the 
expurgated lower bound to the JSCC exponent. 

9F, 




Theorem 4 For the expurgated lower bound in Theorem 3, the following conditions are 
equivalent. 

• tR$(Q) < R ex (W) ^p x >l^ tR${Q) <E^ m < RexiW). Thus, 

Ej(Q, W, t) > E x (p^, W) - tE s (p x , Q). 

• tRg(Q) > R ex (W) p x = 1 R xm = tR$(Q) > R ex {W). Thus, 

Ej(Q,W,t) > E X (1,W) - tE s (l,Q). 

The proof of Theorem 4 is similar to that of Theorem 2 and is hence omitted. We next 
use Theorems 2 and 4 to compare Csiszar's random-coding and expurgated lower bounds. 
Of clear interest is the case when the expurgated bound improves upon the random-coding 
bound. 

Corollary 4 The source-channel random-coding bound is improved by the expurgated 
bound (i.e., E r (Q, W, t) < ^(Q, W, t)) if and only if tR${Q) < R ex (W). 

Proof: When tR^(Q) < R ex {W), we must have that tR { ^{Q) < R cr {W), since R ex {W) 
is never larger than R cr {W). It follows from Theorem 2 that the random-coding lower 
bound is attained at R m = tR^)(Q). By Theorem 4 the expurgated lower bound is 
attained at R ex (W) > R xm > tR^(Q). On account of Lemma 6, this must happen if 
R xm = tH(Q^) with p x > 1. Thus, R^ m > R m and 

E r (Q,W,t) = Er (R m ,W)+te(=^ : Q 

< E r (g^ m ,W)+te(=fi,Q 

< Eex (R xm ,W)+te(^,Q 



t 



= E ex (Q,W,t). 



In this case, the source-channel expurgated lower bound is tighter than the random-coding 
lower bound. We then show that E r {Q, W,t) > E^(Q, W,t) if tR ( ^{Q) > R ex {W). 
When R ex (W) < tR ( ^{Q) < R cr {W), it follows from Theorems 2 and 4 that 

E r (Q,W,t) = E (l,W)-tE s (l,Q) 
= E x (l,W)-tE s (l,Q) 



2fi 



where the second equality follows from the fact that, for any P x , Gallager's channel 
functions E (l, P X ,W) and E x (l, P x , W) are equal [23], and hence their maxima are 
equal. In this case, the source-channel random-coding lower bound is identical to the 
expurgated lower bound. 

When tR { C r\Q) > R cr {W), we must have tR$(Q) > RexiW). Then the expurgated 
lower bound is attained at R xm = tR$(Q) by Theorem 4. On account of Theorems 2 and 
Corollary 1, the random-coding lower bound is attained at R m = tH(Q^) > R cr (W) 
with p* < 1. Consequently, 

E r (Q,W,t) = Er (R m ,W)+te(=^,Q 

> E ex (R m ,W)+te(=^,Q 

> E ex (R xm ,W)+te(^,Q 
= E ex (Q,W,t). 

In this case, the source-channel random-coding lower bound is tighter than or equal to 
the expurgated lower bound. ■ 



Example 3 (DMS and Equidistant DMC) A DMC W = P Y \x is called equidistant 
if there exists a number (5 > such that for all pairs of inputs x ^ x, 



J2\l p Y\x(y\x)Py\x(y\Z)=(l. 



Note that equidistant DMCs have zero-error capacity, and every DMC with binary input 
alphabet is equidistant. It is shown in [31] that for an equidistant channel, E x (p,W) is 
achieved in the range p > 1 by a uniform input distribution P x (x) = Therefore, 
we can write E x (p, W) as 

E x (p,W) = -p\og(^l^p k P + j^j for p>l. 

Now we apply Theorems 3 and 4 to DMS Q and equidistant DMC W with transmission 
rate t. We then see that if 

tH(Q") + log ( + ±.) < (52) 

1*1-1 
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the expurgated JSCC lower bound is tighter than the random-coding lower bound and is 
given by 

Ej(Q,W,t) > -^ lo g(^T^^ + r^) -^I + Pjlog^g 1 ^^), (53) 



where is the unique root of the equation 
fff(QM)+log 



\x\) + 



\x\-i 

Consider a communication system with a binary source with distribution {q, 1 — q}, 
a binary erasure channel (BEC) with erasure probability a and transmission rate t — 1 
(similar results hold for other cases, as in the last example). Using the conditions (43), 
(44) in Example 2, and together with (52), we present in Fig. 5 the set of (a,q) points, 
partitioned into four regions. If the pair (a, q) is located in region B, then the system 
Ej is positive and exactly known. If (a, q) G C = Ci U C2, then upper and lower 
bounds for Ej are known. Here, region C2 consists of the values of (a, q) for which the 
source-channel expurgated lower bound given in (53) is tighter than the source-channel 
random-coding lower bound. Finally, when (a, q) G A, Ej(Q,W,t) = 0. In Fig. 6, we 
plot the random-coding and expurgated lower bounds for different source and BEC pairs. 
We observe that when the source distribution is Q—{0. 1,0.9} (respectively Q={0.2,0.8}), 
the expurgated lower bound for Ej is tighter than the random-coding lower bound if 
a < 0.0297 (respectively if a < 0.0102). 

4 When is JSCC Worthwhile: JSCC vs Tandem Cod- 
ing Exponents 

4.1 Tandem Coding Error Exponent 

A tandem code (/*,<) = (fen / sn ,V™ Pen) for a DMS {Q : S} and a DMC {W : 
X — > y} with blocklength n and transmission rate t (source symbols/channel use) is 
composed independently by a (tn, M) block source code (f sn , <fsn) defined by f sn : S tn — >■ 
{1, 2, M} and (p sn : {1, 2, M} — >■ S tn with source code rate 

R s = source code bits/source symbol, 

tn 
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and an (n,M) block channel code (fen, 'Pen) denned by f cn : {1,2, M} — > X n and 
Pen '■ y n — ► {1, 2, M} with channel code rate 

R c = — — source code bits/channel use, 
n 

where "o" means composition and R s and R c are independent of n. That is, blocks s tn of 
source symbols of length tn are encoded as integers (indices) f sn {s tn ) from {1,2,...,M}, 
and these integers are further encoded as blocks x n = f cn [f sn {s tn )} of symbols from X of 
length n, transmitted, received as blocks y n of symbols from y of length n. These received 
blocks y n are decoded as integers <p C n{y n ) from {1,2, ...,M}, and finally, these integers 
are decoded as blocks of source symbols <p n (y n ) = p sn [ ( Pcn(y n )] °f length tn. Thus, the 
probability of erroneously decoding the block is 

P^(Q,W,t) 4 Yl Qtn(s tn )Pn,Y\X {v^fen [fsn^)]) , 

{(s tn ,y n ):ip s „[<pcn(y n )]^s tn } 

where Q tn and P n ,Y\x are the tn- and n-dimensional product distributions corresponding 
to Q and Py\x- respectively 

Definition 2 The tandem coding error exponent E T (Q,W,t) is defined as the largest 
number E for which there exists a sequence of tandem codes (/*, ip* n ) = (/ cn o/ sn , <p S n°<Pcn) 
with transmission rate t and block length n such that 

E < lim inf - - log (Q, W, t) . 

n^oo n 

When there is no possibility of confusion, E T (Q,W,t) will often be written as E T . In 
general, we know that Ej > E T since by definition tandem coding is a special case of 
JSCC. We are hence interested in determining the conditions for which Ej > E T for the 
same transmission rate t. Meanwhile, it immediately follows (from the JSCC theorem) 
that E T can be positive if and only if tH(Q) < C; otherwise, both Ej and E T are zero. 

By definition, the tandem coding exponent results from separately performing and 
concatenating optimal source and channel coding, which can be expressed by (e.g., see 
[17]) 

E T (Q,W,t) = sup mm {te(R s ,Q),E(R c ,W)} 

Rs ■ t Rc'.Rc = tRs 

= supminjte (j,q),E(R,W)X, (54) 
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where e(R, Q) and E(R, W) are the source and channel error exponents, respectively. 
Note that 

sup te(^,Q] =te(log|S|,Q) = -flog(|S|Q(i)), 

fl<tlog|S| V 1 J 



where Q(s) is the geometric mean of the source probabilities, i.e. Q(s) = (YlsesQi 3 )) 1 ^^ — 
1/\S\. If -t\og(\S\Q{s)) > E(t\og\S\,W), then the graphs of te(R/t,Q) and E(R,W) 
must have exactly one intersection R Q and by (54) 

E T (Q,W,t)=te(^j,Q^j =E(R ,W), (55) 

since te(R/t,Q) is strictly increasing in R e [tH(Q), t log |«S|] and E(R, W) is non- 
increasing in R. If —t\og{\S\Q(s)) < E (t log \S\,W), then there is no intersection be- 
tween te(R/t, Q) and E(R, W). Recall (24) that te(R/t, Q) is infinite in the open interval 
(t log |«S|, oo). In this case, we have that 

Er(Q,W,t)=E(t\og\S\,W) (56) 

by (54). Without loss of generality, we denote 

the rate satisfying te(^, Q) = E(R a , W) 



Ro = < 



if -t\og(\S\Q(s)) > E(t\og\S\,W), 

t\og\S\ 



(57) 



if -t\og(\S\Q(s)) < E(t\og\S\,W), 



so that we can always write that E?{Q, W, t) = E(R a , W). 

When the DMS is uniform, the optimal source coding operation reduces to the trivial 
enumerating (identity) function with M = \S\ tn as the source is incompressible. Hence 
only channel coding is performed in both JSCC and tandem coding and Ej(Q,W,t) = 
Er{Q,W,t) = E(t\og \S\, W). Thus, our comparison of the two exponents is nontrivial 
only if the source is nonuniform and tH(Q) < C . Even though we know that Ej is never 
worse than E T , the following theorem gives a limit on how much Ej can outperform E T . 

Theorem 5 JSCC exponent can at most be equal to double the tandem coding exponent, 
i.e., 

Ej(Q,W,t)<2E T (Q,W,t), 
with equality if tR^(Q) > R cr (W) and T sp (p*, W) = tE s (p*, Q) + 2tD(Q^ || Q). 
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Remark 6 Equivalently, this upper bound also implies that Ej can at most exceed E T 
by Ej/2, i.e., 

Ej(Q, W, t) - E T (Q, W, t) < l -Ej{Q, W, t). (58) 
Proof: We first refer to the upper bound of Ej(Q, W,t) given by Csiszar [17, Lemma 2] 

(59) 



Ej(Q,W,t)< min 

tH(Q)<R<tlog\S\ 



h [j,Q)+E(R.W) 



where te(R/t,W) is the source error exponent, which is strictly convex and increasing 
in [tH(Q),t\og\S\], and E(R,W) is the channel error exponent, which is a positive and 
non-increasing in [0, C). Unlike the source exponent, the behavior of E(R, W) is unknown 
for R < R cr (W). Let Cq be the zero-error capacity of the channel W, i.e., E(R, W) = oo 
if and only if R < C [23]. If C > t log |<S|, obviously, we have 

Ej(Q, W, t) = E T (Q, W, t) = +oo. 

If C < t log |<S|, the upper bound in (59) is finite and the minimum must be achieved by 
some rate, say R m , in the interval [C ,£log |«S|]. Then 

Ej(Q,W,t) < te(^,Q)+E(R m ,W) 



( b ) (R \ 

< te(-^,Q\+E(R ,W) 



(c) 

< 2E(R ,W) 
= 2E T (Q,W,t). 

Here, the equality in (a) holds if our computable upper and lower bounds, E sp (Q, W, t) 
and E_ r (Q,W,t), are equal. To ensure this, we need the condition tR^(Q) > R cr (W) 
by Theorem 2. The equality in (b) holds if R m = R Q by definition of R m . The equal- 
ity (c) holds if and only if there is an intersection between te(R/t,W) and E(R,W), 
i.e., te(R Q /t,Q) = E(R Q ,W). Now taking these considerations together, and applying 
Theorem 2 again, we conclude that Ej = 2E T if tR$(Q) > R cr {W) and T sp (p*,W) - 
tE s (p*, Q) = 2te(Rjt, Q) = 2tD{Q^ \\Q). ■ 



Observation 4 The condition for the equality states that, if the minimum in the expres- 
sion of E_ r (Q, W, t) given in (29) is attained at the intersection of te(j, W) and E r (R, W) 
which is no less than the critical rate of the channel, then the JSCC exponent is twice 
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as large as the tandem coding exponent. In that case, the rate of decay of the error 
probability for the JSCC system is double that for the tandem coding system. In other 
words, for the same probability of error P e , the delay of (optimal) JSCC is approximately 
half of the delay of (optimal) tandem coding, 



P e « 2- nET{Q ' w ' l) = 2-* J5 '«> w W for n sufficiently large. 
4.2 Sufficient Conditions for which Ej > E T 

In the following we will use our previous results to derive computable sufficient conditions 
for which Ej > E T . We first define 

A \ the root of tH(QW) = R cr (W) if tH(Q) < R cr (W) < t log |«S|, 



such that the source error exponent te(R/t,Q) has a parametric expression at R cr {W) 




if tH(Q) > R cr (W). 



(60) 




(61) 



Note that 7 is well defined only if R cr {W) < tlog|«S|. Denote 



T(p*) = T sp (p*,W) -tE s (p*,Q). 



(62) 



Theorem 6 Let R cr (W) < *log|5|. If 



max 



{tRg(Q),E (l,W)-tD(QM || Q)} > R cr (W), 



(63) 



then 



Ej(Q,W,t)>E T (Q,W,t). 



More precisely, we have the following bounds. 




Ej(Q, W, t) - E T (Q, W, t) > l -T{p*) - 1 -T(J)*) ~ tD(Q^) || Q) > , (64) 



where the two equalities in (64) cannot hold simultaneously, 
(b) If tR^(Q) > R cr (W) > E (l, W) - W(Q^ || Q), then 



Ej{Q, W, t) - E T (Q, W, t) > T(p*) - tD(QW || Q) > 0. (65) 
(c) If E (l, W) - tD(QW || Q) > R cr {W) > tR { cr\Q), then 



Ej(Q,W,t)-E T {Q,W,t) >R cr (W)-tE s (l,Q) > 0. 



(66) 



Proof: We shall show that, in each of the three cases, (a), (b), and (c), we have Ej > E T . 



(a) . Assume tR${Q) > R cr (W) and E {1, W) -tD(Q™ || Q) > R cr (W). By definition of 
7, we have tD(Q^ || Q) = te(R cr (W)/t,Q), see (24) and (61). Thus, the latter condition 
is equivalent to E(R cr (W), W) > te(R cr (W)/t, Q) and by (16) and the related discussion 
it guarantees that R Q > R cr (W), where R Q is defined in (57). According to Theorem 2, 
when tRcJ(Q) > R cr (W), E sp (Q, W, t) is attained by R rn > R cr (W) and Ej is determined 
by 

Ej(Q, W, t) = te Q^j + E sp (R m , W). 

Since R Q > R cr (W), E T is determined by E sp (R Q) W). If R Q ^ R m , we must have 

E T (Q, W, t) < max jte (^f,Q), E sp (R m , W) } , 

because te(R/t, Q) is strictly increasing and E sp (R, W) is strictly decreasing at R m . Thus, 

Ej(Q,W,t)-E T (Q,W,t) >min|te^,g^ , E r (R m , W)| > 0, (67) 

where equality holds if R m = C. If R Q = R m , then immediately, 

Ej(Q,W,t)-Er(Q,W,t)=te(^,<2j = tD(Q {r) \\ Q), (68) 

where the above is positive since p* > by Lemma 5 (1). Note also that in this case 
te(R m /t,Q) = E r (R m ,W), so (67) and (68) can be summarized by (64). 

(b) . In this case, we have R m > R cr (W) > R Q . We can upper bound E T by 



Er(Q, W, t) = te y-f, Qj < te (^^-^, Qj = tD{Q™ \\ Q) 
and hence 

Ej(Q, W : t) - Er(Q, W : t) > T sp {p\ W) - tE s (p*, Q) - tD(Q^ \\ Q). 
The above lower bound must be nonnegative since 

T sp (p*, W) - tE s (p*, Q) - tD(Q™ || Q) = E r (R m , W) + t 

> E r (R m ,W) 

> 0, 
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and it is equal to if R cr (W) = R m = C. 

(c). In this case, we have R Q > R cr (W) > R m and from (41) Ej is bounded by 

Ej(Q, W, t) > E (l, W) - tE s (l, Q). 
On the other hand, by the monotonicity of E r (R, W), we can upper bound Et by 

E T (Q, W, t) = E r (R , W) < E r (R cr (W), W) = E (l, W) - R cr {W). 
Thus we obtain 

Ej(Q, W, t) - Er(Q, W, t) > R cr (W) - tE s (l, Q). 
The above is positive since 

E (l,W)-tE s (l,Q) = te(=^,Q^+E r (R m ,W) 

> E r (R m ,W) 

> E r (R cr (W),W) 

= E (l,W)-R cr (W), 

where the first inequality follows from the fact that R m > tH(Q) by Lemma 5 and 
Corollary 1. 

■ 

As pointed out in the proof, the condition tR$(Q) > R cr (W) means that the JSCC 
exponent Ej is achieved at a rate no less than R cr (W). The second condition, E Q (1, W) — 
tD(QW || Q) > R cr {W) means that the tandem coding exponent Et is achieved at a 
rate no less than R cr (W). Hence (63) in Theorem 6 states that Ej would be strictly 
larger than Et if either Ej or Et is determined exactly Conversely, if the conditions in 
Theorem 6 are not satisfied, then neither Ej nor E T are exactly known. Nevertheless, if 
the lower bound of Ej is strictly larger than the upper bound of Et, then we must have 
Ej > E T . Hence we obtain the following sufficient conditions. 

Theorem 7 Let E ex (0, W) < oo and let Uog|<S| > R cr (W), where E ex (R,W) is the 
expurgated channel error exponent [23]. If 

£„(!, W) -tE,(\,Q) > E Rl A *.Ml°g|3| + MMim + W0,HQ 

h - k 2 
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where 



h = 



d(qw h g)+iog(|s|QQQ) 

H(QW)-\og\S\ 



and 



k 2 = 



E (1,W) - E ex (0,W) 
RAW) 



-1, 



then Ej(Q,W,t) > E T (Q,W,t). 

Theorem 8 Let t log \S\ > R cr (W). If E (l, W) - tE s (l, Q) > tD (QW || Q), where 7 is 
defined in (60), then Ej(Q, W,t) > E T (Q, W,t). 

In Theorems 7 and 8, we establish the sufficient conditions by comparing the source- 
channel random-coding bound derived in Theorem 2, with the upper bound of tandem 
coding exponent obtained by using the geometric characteristics of e(R, W) and E(R, W). 
The proofs of Theorems 7 and 8 are given in Appendices B and C, respectively. These 
conditions can be readily computed since it only requires the knowledge of R cr (W) and 
E ex (0, W). Note that the condition -E^O, W) < 00 in Theorem 7 is satisfied by the DMCs 
with zero-error capacity equal to 0, see [19, p. 187]. Thus, Theorem 7 applies to equidistant 
channels, in particular, to every channel with binary input alphabet. An expression of 
E ex (0, W) for the DMC with zero-error capacity is given in [23, Problem 5.24]. 

Example 4 (When Does the JSCC Exponent Outperform the Tandem Coding 
Exponent?) We apply Theorems 6, 7 and 8 to the binary DMS with distribution {q, 1—q} 
and BSC with crossover probability e, and the binary DMS {q, 1 — q} and the binary 
erasure channel (BEC) with erasure probability a, under different transmission rates t. If 
any one of the conditions in these theorems holds, then Ej > Et- The above conditions 
are summarized by Region F in Fig. 7. Indeed, Region F shows that Ej > Ex for a wide 
range of (e, q) or (a, q) pairs. Region G consists of the pairs (e, q) or (a, q) such that 
tH(Q) > C; in this case, Ej = E T = 0. Finally, when (e, q) or (a,q) falls in Region H, 
we are not sure whether Ej is still strictly larger than E T . 

Example 5 (By How Much Can the JSCC Exponent Be Larger Than the 
Tandem Coding Exponent?) In the last example we have seen that Ej > Et holds for 
a wide large class of source-channel pairs. Now we evaluate the performance of Ej over Ex 
by looking at the ratio of the two quantities. Recall that when Theorem 6 (a) is satisfied, 
both Ej and Et are exactly determined. In this case we can directly compute Ej (using 
the results of Section 3) and Et (using (55) and (56)). When Ej (respectively, Et) is not 



known, i.e., when tR ( c 7(Q) < R cr (W) (respectively, E Q (1, W) - tD(Q™ || Q) < R cr (W)), 



we can calculate the lower bound of Ej (respectively, the upper bound of E T ) instead 
and thus obtain a lower bound for Ej/E T . For general DMCs, we lower bound Ej by its 
random-coding lower bound E_ r (Q,W,t). For equidistant DMCs, particularly for binary 
DMCs, when tR^iQ) < Rex(W), we use the expurgated lower bound E_ ex (,Qi W, t); when 
tR^riQ) > R ex {W), we use the random-coding lower bound E r (Q, W, t). To calculate the 
upper bound of E T , when E a (l, W) - tD(Q™ || Q) < R cr {W) < R { J?{Q), or equivalently 
when R < R cr (W) < R m , we can bound E T by 

E T (Q, W, t) < min {tD (Q™ || Q) , E sp (R s , W) } , 

where R s is the intersection of E sp (R, W) and te(R/t, Q) if any; otherwise R s = t log \S\. 
When E (l, W) - tD(Q™ \\ Q) < R cr (W) and R&\Q) < R cr (W), we bound E T by 

E T (Q,W,t) < E sp (R s ,W). 

Table 1 exhibits Ej/E T (or its lower bound, which must be no less than 1) for the binary 
DMS {q, 1 — q} and BSC (e) system under transmission rates t = 0.5, 0.75 and I. It 
is seen that the ratio Ej/E T can be very close to 2 (its upper bound) for many (q,e) 
pairs. For other systems, we have similar results: Ej substantially outperforms E T . For 
instance, for binary DMS {q, 1 — q] and BEC (a) with t — 1, we note that Ej/E T > 1.4 
for a wide range of (q, a)'s; for ternary DMS and BSC or for DMS and ternary symmetric 
channel, if transmission rate t is chosen suitably (such that tH(Q) < C), we obtain that 
Ej/E T > 1.5 for many source-channel pairs. 

4.3 Power Gain Due to JSCC for DMS over Binary-input AWGN 
and Rayleigh-Fading Channels with Finite Output Quanti- 
zation 

It is well known that M-ary modulated additive white Gaussian noise (AWGN) and 
memoryless Rayleigh-fading channels can be converted to a DMC when finite quantization 
is applied at their output. For example, as illustrated in [4], [41], we know that the 
concatenation of a binary phase-shift keying (BPSK) modulated AWGN or Rayleigh- 
fading channel with m-bit soft-decision demodulation is equivalent to a binary-input, 
2 m -output DMC (cf. Fig. 8). We next study the JSCC and tandem coding exponent for 
a system involving such channels to assess the potential benefits of JSCC over tandem 
coding in terms of power or channel signal-to-noise ratio (SNR) gains. 
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We assume that the BPSK signal U n £ {— 1,+1} corresponding to the signal input 
X n is of unit energy, and V n is a zero-mean independent and identically distributed (i.i.d.) 
Gaussian random process with variance N a /2. The channel SNR is defined by SNR = 
E[Ul\/E\y%\ = 2/ N and the received signal is 

Z n — A n U n + V n , n = 1,2, 

where A n is 1 for the AWGN channel (no fading), and for the Rayleigh-fading channel, 
{A n } is the amplitude fading process assumed to be i.i.d. with probability density function 
(pdf) 




2ae~ a2 , if a > 0, 
0, otherwise, 



such that E[A„] = 1. We also assume for the Rayleigh-fading channel that A n , U n and 
V n are independent of each other, and the values of A n are not available at the receiver. 
At the receiver, as shown in Fig. 8, each Z n e M is demodulated via an m-bit uniform 
scalar quantizer with quantization step A to yield Y n e {0, l} m . If the channel input 
alphabet is X = {0, 1} and the channel output alphabet is y = {0, 1, 2, 2 m — 1}, then 
the transition probability matrix II is given by 

n = [7Ttf], iex, jey, 

where 

4 p(y = j\X = i) = Q ((Tj-i - (2i - l))v / SNR) - Q ((T, - (2i - 1))v / SNr) 

for the AWGN channel [41], and 

Try 4 P(y = jpT = i) = iV(2}|i) - ^(li-ili) 

for the Rayleigh-fading channel [4]. Here F z \ x (z\i) = Pr{Z < z\Z = i} is given by [4], 
[49] 

f Z \ e -(z 2 /{N +l)) \ ( \ 

F Z \ X (z\\) = l-F zlx (-z\0) = l-Q - x 1 - Q = 

' ' Vv^oTs/ v/^ + T [ \y/N (N + l)/2j 

where Q(x) is the complementary error function 

1 f°° 

Q(x) = -= / exp{-t 2 /2}^, 
V 2tt J x 

Z7 



and {Tj} are the thresholds of the receiver's soft-decision quantizer given by 

{— oo, if j = —1, 

( j + l_ 2 — i)A, if j = 0, 1, 2 rn — 2, (69) 
+oo, if j = 2 m - 1 

with uniform step-size A. For each channel SNR, the suitable quantization step A is 
chosen as in [41], [4] to yield the maximum capacity of the binary-input 2 m -output DMC. 

We compute the JSCC and tandem coding exponents for the binary source and the 
binary-input 2 m -output DMC converted from the AWGN (Rayleigh-fading, respectively) 
channel under transmission rate t — 0.75 (t — 1, respectively), and illustrate the power 
gain due to JSCC. In Figs. 9 and 10, we plot Ej and E T for binary DMS Q = {0.1, 0.9} 
and m — 1, 2, 3 by varying the channel SNR (in dB). We point out that in both the two 
figures, when SNR < 6 dB for m = 2, 3 and when SNR < 8 dB for m = 1, Ej and E T 
are determined exactly. We observe that for the same SNR, Ej is almost twice as large 
as Et- Furthermore, for the same exponent and the same (asymptotic) encoding length, 
JSCC would yield the same probability of error as tandem coding with a power gain of 
more than 2 dB. A similar behavior was noted for other values of transmission rate t. 

5 JSCC Error Exponent with Hamming Distortion 
Measure 

Let S be a finite set and d(-, •) be a distortion measure, i.e., a nonnegative valued function 
d defined on S x S and extended to S n x S n by setting 

Th 

i=l 

A JSC code with blocklength n and transmission rate t > for a tn-length DMS 
{Q : S} and a DMC {W : X — > y} with a threshold A of tolerated distortion is a pair of 
mappings /„ : S tn — ► X n and (p n : y n — > S tn . The probability of the code exceeding 
the threshold A is given by 

P ( l l \Q,W,t) ^ Qtn(s tn )Pn,Y\x(y n | fn^)), 

{(s tn ,y n ):d(s tn ,ip n (y n ))> A} 

where Qt n and P n ,Y\x are the tn- and n-dimensional product distributions corresponding 
to Q and Py\x respectively. P^\Q, W, t) is also called the probability of excess distortion. 
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We remark that for the JSCC with a distortion threshold, we allow that the source has a 
uniform distribution. 

Definition 3 The JSCC error exponent Ef(Q, W, t) is defined as the largest number E A 
for which there exists a sequence of JSC codes (/„, ip n ) with blocklength n and transmission 
rate t such that 

E A < lim inf -- log P { r ] (Q, W, t) . 

n— >oo n 

When there is no possibility of confusion, Ef(Q ) W ) t) will often be written Ef. In 
[18], Csiszar proved that for a DMS Q and a DMC W, the JSCC error exponent under 
distortion threshold A satisfies 



where 



and 



E A (Q, W, t) < Ef(Q, W, t) < E sp (Q, W, t), (70) 

] (71) 

tF (j,Q,&) +E sp (R,W) 



E A (Q,W,t) 4 inf 

R>0 



Et(Q,W,t)±M 



tF[j,Q,A)+E r (R,W) 



(72) 



In the above, 



F(R,Q,A)= inf D(P\\Q) (73) 

P:R(P,A)>R 

is the source error exponent with a fidelity criterion [37] and R(P, A) is the rate distortion 
function (e.g., [16], [19]). E r {R,W) and E sp (R,W) are the random-coding and sphere- 
packing bounds to the channel error exponent. Likewise, if the infimum in (71) or (72) is 
attained for a rate larger than the channel critical rate, then the lower and upper bounds 
coincide, and we can determine Ef exactly. Of course, the two bounds are nontrivial if 
and only if tR(Q, A) < C by the JSCC theorem. 

It can be shown that F(R, Q, A) is a nondecreasing function in R. However, unlike 
e(R, Q), F(R, Q, A) is not necessarily convex or even continuous in R [1], [37]. Therefore, 
it is hard to analytically compute the JSCC exponent Ef in general. In this section we 
only address the computation of Ef for a binary DMS and an arbitrary DMC under the 
Hamming distortion measure dn(-, •), given by 

MM) = ('- "^!' (74) 
10, if s = s. 

We first need to derive a parametric form of F(R, Q, A). Define 

Ef(p, Q) = (1 + p) log + (1 - g)^) - ph b (A). (75) 
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Lemma 7 For binary DMS Q = {q, 1 — q} (q < 1/2) under the Hamming distortion 
measure (74) and distortion threshold A such that A < 1/2, the following hold. 

{+oo, R>l-h b (A), 
su Pp > po [ P R -E*(p,Q)}, R(Q,A)<R<l-h b (A), (76) 
0, R < R(Q,A), 

where the rate-distortion function R(Q, A) = h b (q) — h b (A) and p = if q > A; otherwise 
R(Q,A) = and p is the unique root of equation H(Q^) = h b (A) such that po > 0. 

The proof of this lemma is given in Appendix D. It can be easily verified that 
F(R, Q, A) is continuous and convex in R G (— oo, 1 — h b (A)] if q > A and F(R, Q, A) 
is continuous and convex in R G (0,1 — h b (A)} and has a jump at R = R(Q, A) = if 
q < A. According to Lemma 7, the source error exponent tF(R/t,Q, A) is the convex 
transform of tE^(p, Q) in [p , +oo). Define the binary divergence by 

D{A || q) 4 A log - + (1 - A) log i— ^. (77) 
9 1 - 1 

Adopting the approach of Section 3, we can apply Fenchel's Duality Theorem to E_^(Q, W : t) 
and Ef p (Q, W, i) and obtain equivalent computable bounds. 

Theorem 9 Given a binary DMS (q < 1/2) and a DMC W under the Hamming dis- 
tortion measure and distortion threshold A (A < 1/2), the JSCC exponent satisfies the 
following. 

1) Lower Bound: If < A < y/q/(^+ y/1 — q), then p < 1 and 

Ef(Q,W,t)= max \T r (p,W) - tE*(p, Q)], (78) 

po<p<l 

Otherwise, if A > y/q/ (^/q + ^/\ — q), then 

Ef(Q,W,t) =tD(A || q)+E (l,W). (79) 

2) Upper Bound: 

E%(Q,W,t) = su V [T sp (p,W)-tE*(p,Q)]. (80) 

P>P0 

Since the above result is a simple extension of the results in Section 3, the proof is 
omitted and we hereby only provide the following remarks. 

40 



(a) Similar to the lossless case, iit(h b (q)-h b (A)) > C, then E A (Q, W, t) = E (Q,W,t) = 



-=A 



0. If R^W) > f(l - h b (A)), then E^{Q, W, t) = +00. 



(b) Note that when A > ^/q/{^/q + ^T^q), E A (Q, W,t) in (71) is achieved at R J, 0+, 
and 



E?{Q,W,t) = lim 
Rlo+ 



= lim 

R10+ 



tF(j,Q,A) + E r (R,W) 



t inf D(P || Q) + Eq(1, W) - R 

P:R(P,A)>t 



tD(A || q) + E (l,W). 



(c) In the special case where the binary source is uniform, i.e., q = 1/2, Theorem 9 
reduces to 

max _ h 6 (A)) + T r (p, W)\ < Ef{Q, W, t) < sup [-pt(l - h b (A)) + T sp (p, W)\ 



This is clearly equivalent to 



E r (t(l - ^(A)), W) < Ej(Q, W, t) < E sp (t(l - h b (A)), W) 



(81) 



by the definition of T r (p, W) and T sp (p,W). In other words, Ef is bounded by 
the channel random-coding and sphere-packing bounds at rate t(l — h b (A)). If 
t(l — h b (A)) > R cr (W), then Ef is exactly determined. 

(d) When the source is nonuniform, Ef(p, Q) = E s (p, Q) — pth b (A) is strictly concave 
in p. In this case, the maximizer 

p A 4 arg sup [T sp (p, W) - tE*(p, Q)] 

P>P0 

is strictly larger than p if t(h b (q) - h b (A)) < C and R^W) < t(l - h b (A)). 
Particularly, p A < 00 if R^W) < t(l — h b (A)). As counterparts of Lemma 5 
and Corollary 1, it can be shown that the upper bound E^ {Q, W, t) in (72) is 
attained at R^ = H(Q^ A ^) — h b (A) and the lower bound in (71) is attained at 
R A = H(Q^ A ^) — /15(A), where p A = min{p A , 1}. Consequently, other similar 
results to the lossless case regarding these optimizers can be obtained. 

Example 6 For a binary DMS {q, 1 — q} (q < 0.5) and a BSC (e) under transmission 
rate t — 1, we compute the JSCC error exponent under the Hamming distortion measure 



41 



with distortion threshold A (A < |). In Fig. 11, if the pair (e,q) is located in region B, 
then the corresponding JSCC exponent can be determined exactly (the lower and upper 
bounds are equal). If (e, q) is located in region Ci, then Ef is bounded by (78) and (80). 
If (e,q) is located in region C 2 , then Ef is bounded by (79) and (80). When (e,q) G A, 
Ef is zero, and the error probability of this communication system converges to 1 for n 
sufficiently large. So we are only interested in the cases when (e, q) e B U Ci U C 2 . 

Fig. 12 shows the JSCC error exponent lower bound of the binary DMS {q, 1 — q} 
(q < 0.5) and BSC (e) pairs under different distortion thresholds. We fix the BSC 
parameter e = 0.2, and vary q from to 0.5. In Fig. 12, Segment 1 is determined by (79), 
and Segments 2 and 3 are determined by (78). Furthermore, the lower bound coincides 
with the upper bound (80) in Segment 3; i.e., the JSCC exponent is exactly determined 
in Segment 3. 

6 Conclusions 

In this work, we establish equivalent parametric representations of Csiszar's lower and 
upper bounds for the JSCC exponent Ej of a communication system with a DMS and 
a DMC, and we obtain explicit conditions for which the JSCC exponent is exactly de- 
termined. As a result, the computation of the bounds for Ej is facilitated for arbitrary 
DMS-DMC pairs. Furthermore, the bounds enjoy closed-form expressions when the chan- 
nel is symmetric. A byproduct of our result is the fact that Csiszar's random-coding lower 
bound for Ej is in general larger than Gallager's lower bound [23]. 

We also provide a systematic comparison between Ej and E T , the tandem coding error 
exponent. We show that JSCC can at most double the error exponent vis-a-vis tandem 
coding by proving that Ej < 2Et and we provide the condition for achieving this doubling 
effect. In the case where this upper bound is not tight, we also establish sufficient explicit 
conditions under which Ej > E T . Numerical results indicate that Ej 2E t for a large 
class of DMS-DMC pairs, hence illustrating the substantial potential benefit of JSCC 
over tandem coding. This benefit is also shown to result into a power saving gain of more 
than 2 dB for a binary DMS and a BPSK-modulated AWGN/Rayleigh channel with finite 
output quantization. Finally, we partially investigate the computation of Csiszar's lower 
and upper bounds for the lossy JSCC exponent under the Hamming distortion measure, 
and obtain equivalent representations for these bounds using the same approach as for 
the lossless JSCC exponent. 

A9. 



A Proof of Theorem 2 and Corollary 1 



Theorem 2 can be shown by a left- and right- derivatives argument combined with the 
results of Lemma 5. Let si(R) and s r (R) be the left and right-slopes (or left- and right- 
derivatives) of E sp (R,W) at each R > R 00 (W). Let ri(R) and r r (R) be the left and 
right slopes of E r (R,W) at each R > 0. Let p(R) be the slope of te(R/t,Q) for any 
R G [tH(Q),t log \S\]. It is easy to verify that these slopes have the following properties 
(cf. [13], [23], [43]): 

(a) si(R) and s r (R) exist for every R > R oc (W) and are nondecreasing in R. 

(b) ri(R) and r r .(R) exist for every R > and are nondecreasing in R. 

(c) si(R) < s r {R) < -1 for R < R cr (W), -1 < s t (R) < s r (R) < for R cr (W) < R<C, 
and Sl (R) = s r (R) = for R > C. si{R cr (W)) < -1 < s r (R cr (W)) and Sj (C) < = 
s r (C). 

(d) r,(i2) = r r (i2) = -1 for i? < R cr (W), n{R) = si(R) for R > R cr (W), and 
r r (i?) = s r (R) for it! > R cr (W). n{R cr {W)) = -1 < ^(^(W)). 

(e) p(-R) is a strictly increasing function of R and is determined by R = tH [Q^ 11 ^ for 
tH(Q) <R< tlog\S\. Specifically p(tH(Q)) = and p(tlog\S\) = oo. 

(f) p* = p(R m ), where p* and i? m are defined in (37) and (39), respectively. 

(a) and (b) follows from the convexity of E sp (R,W) for R > R^W) and E r (R,W) 
for R > 0, see [43, pp. 113-114]. Recalling that E r (R, W) involves a straight-line section 
with slope -1 for R e [0,^(^)1 and S r (i2,W) = E sp (i2, W) only for R > R cr (W), 
where they both are equal to for R > C, we obtain (c) and (d) from (a) and (b). From 
(24), we know that te(R/t,Q) = W (Q (p ^ || Q) for tH(Q) < R < Uog|«S|, where p* is 
the unique root of tH(Q^) = R. Also, it is easy to verify [13] that such p* is exactly the 
slope of te(R/t,Q) at R, i.e., 

dte(R/t,Q) „ 
dR =P - 

Thus (e) follows. Recalling also that in Lemma 5 we have shown the relation R m = 
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tH(Q( p *^), since there is unique p satisfying this equation, we obtain (f). 



Based on the above setup, the following lemma illustrates the geometric conditions for 
which E r (Q, W, t) and E sp (Q, W, t) are attained. 

Lemma 8 Let tH(Q) < C and let R oc (W) < tlog|«S|. The minimum in (30) is attained 
at R m if and only if —si(R m ) > p(R m ) > —s r (R m ), and the minimum in (29) is attained 
at R m if and only if -n(R m ) > p(R m ) > -r r (R m ). 

Proof: 

1. Forward part: We only show the case for the upper bound E sp (Q, W, t), since the 
case for the lower bound can be shown in a similar manner. We first show that a rate 
Ri G [tH(Q),t\og\S\] satisfying —si(Ri) > p{R\) > —s r (R 1 ) must achieve the minimum 
in E sp (Q, W, t). Define functions 



h(R) 



and 



9i(R) = 



E sp (R,W) if R<R U 

E sp (R u W) - l fl '(fli)l+|p(fli)l (R _ R x ) if R>R lm 



te(f,Q) if R<R U 

te(f,Q) + MRl)l f' (Rl)l (R-R 1 ) if R>R ± . 



Since -s,(i2i) > p(Ri) implies s,(i2i) < + \p(Ri)\)/2 and p{R 1 ) < {\p{Ri)\ + 

\si(Ri)\)/2, we claim that fi(R) and gi(R) are both convex functions and hence their sum 
is convex, 



f 1 (R)+g 1 (R) 



te(f,Q)+E sp (R,W) if R<R U 
te(f,Q)+E sp (R 1 ,W) if R>Rl 



Since the convex function fi(R) +gi(R) is constant for R> R\ (noting that the convexity 
is strict in the interval [tH(Q), Ri\), we may write 



mm 

tH(Q)<R<Ri 



te[j,Q)+E sp (R,W) 



te[^,Q)+E sp (R 1 ,W). 



Similarly, using the relation p{R\) > —s r {R\) we can construct convex functions 

f2(R) ± 



E sp (R,W) if R>R 1: 

E sp {R u W) + _ rj if R < Rl , 



AA 



and 



92(B) ± 



te(f,Q) if R>R U 

te(f,Q) + p(Rl) - 2 Sr(Rl) (R - R 1 ) if R<R U 



p(Rl)-Sr(Rl) 

and use them to show that the minimum 



mm 

i?i<H<tlog|5| 



te (j,Q^J + E sp (R,W) 



is attained at R 1 . Thus, R 1 is the minimizer of E sp (Q, W,t), i.e., 



mm 

tH(Q)<R<t\og\S\ 



te 



R 



,Q\+E sp (R,W) 



te\^,Q)+E sp (R 1 ,W). 



2. Converse part: We assume R m e (R 00 (W),t\og\S\) achieves the minimum in (30) but 
p(R m ) < —s r (R m ). Note that p(t\og\S\) = oo > — s r (t log |«S|) provided that £log|«S| > 
Roo(W). Now let R\ be the smallest rate in [i2 OQ (W),ilog|<S|] satisfying p{R\) > —s r (Ri). 
According to our assumption together with (a) and (e), R\ > R m . However, using our 
previous method, we can construct two convex functions fi(R) and g±(R) associated with 
Ri to show 



mm 

tH{Q)<R<R\ 



te[j,Q)+E sp (R,W) 



te ( ^,Q)+E sp {R 1 ,W). 



This is clearly contradicted with the assumption that the minimum is attained at R m , a 
rate smaller than Ri, since there is unique minima due to the strict convexity. Thus, at 
R m we must have p(R m ) > —s r (R m ). Consequently, we can show in a similar manner 
that p(R m ) < -si(R m ). ■ 

The following facts immediately follow from Lemma 8. 
Lemma 9 We have the following relations between R m and R m : 

(1) . If R m > R cr (W) or R m > R cr {W), then R m = R m > R cr (W) and E sp {Q,W,t) = 
E r {Q,W,t). 

(2) . If R m = R cr (W), then R m < R cr {W). 

(3) . R m > R m . 

Proof: (1) is trivial since E r (R,W) = E sp (R,W) for R > R cr (W). UR rn = R cr (W), 
then by Lemma 8 and (d), p(R cr (W)) > -s r (R cr (W)) = -r r (R cr (W)). Using Lemma 8 
again we obtain (2). To show (3), we only need to show the case when R m < R cr (W). 
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According to Lemma 8 together with (c) and (d), we see p(R m ) > 1 and p{R m ) = 1. It 
follows from (e) that R m > R m . 

■ 

This lemma emphasizes that when the JSCC error exponent upper bound is achieved 
at a rate equal to the channel critical rate R cr (W), the lower bound could be achieved at 
a rate smaller than R cr (W). 

In the sequel we shall use properties (c)-(f), and Lemmas 5, 8 and 9 to prove Theorem 2. 
To show A B -<=>- C, we only need to show: A =>- B (Forward) and B =>- C =>- A 
(Converse). 

1. Converse Part. We start from 

P* < 1 p(Rm) < 1 (by (f)) 

=*► R m <tR$(Q) (by(e)) 

and s r (R m ) > — 1 (by Lemma 8) 

Rrn> Rcr(W) (by ( C )) 

=>- tR^(Q) > R m = R m > Rcr(W) (by Lemma 9 (1)) (82) 

or tR ( C r\Q) >R m = Rcr(W) > R m (by Lemma 9 (2)) (83) 

=>. < p* = p* < 1 (84) 

and tRt\Q) >R m = R m > Rcr(W), (85) 

where (84) and (85) are explained as follows. We first claim p* < 1, because p* = 1 
would yield R m > tR^(Q) by Lemma 5 (3), which is contradicted with (82) and (83). 
Since now p* < 1, from Lemma 8 and (d) we know R m > R cr (W). Thus in (83) we must 
have R m = R cr (W) and consequently (82) and (83) can both be summarized by (85). 



Meanwhile, p* = p* follows by Lemma 5. If now 

t = 1 =► p(Rm) = 1 (by (f)) 

=S> R m = tR< c s r ) {Q) (by(e)) 

and si(R m ) < — 1 < s r (R m ) (by Lemma 8) 

Rrn > Rcr{W) (by ( C )) 

=>■ tR${Q) =R m = Rm> Rcr(W) (by Lemma 9 (1)) (86) 

or tR&\Q) = R m = RcriW) > R m (by Lemma 9 (2)) (87) 

=>. p* = p* = 1 (88) 

and t^r ) (g)=^ m = ^ m >i? C r(W / ), (89) 

where (88) and (89) are explained as follows. We first claim that p* = 1. If p* < 1, then 
by Lemma 5 (3) we have R m < tR ( c S r\Q). In (86), we see R m = tR^(Q), contradicted. 
In (87), it is still impossible that R m < tR^)(Q) = R cr (W), because in that case we have 
p{R m ) < p(tRcr(Q)) = 1 by (e), which violates Lemma 8 since R m < R cr (W) implies 
p{R m ) = 1. Thus we must have p* = 1 and (88) follows. According to Lemma 5 (3) 
again, p* — 1 implies R m > tRc)(Q). Hence in (87) we must have R m = tR$(Q). (86) 
and (87) can both be summarized by (89). Next if 

p* > 1 ==> p(R m ) > 1 (by (f)) 

=}► R m >tR$(Q) (by(e)) (90) 
and si(R m ) < — 1 (by Lemma 8) 

Rm < RcriW) (by ( C )) 

=>- R m <R m < Rcr{W) (by Lemma 9 (1) and (3)) 

=► ^ m < i? cr (^) (91) 

=► = -1 = r r (£j (by(d)) 

^> p{R m ) = 1 (by Lemma 8) 

=}► R m = tR${Q) (by(e)) (92) 

^> p* = 1 (by Lemma 5 (3)) 

and R m > R m . (by (90) and (92)). 

To see (91), we let R m = R m = R cr (W). Then using (d) and Lemma 8 yields p(R m ) < 1, 
which is contradicted with the assumption p(R m ) = p(R m ) > 1- To show the last step, 
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we assume p* < 1, then Lemma 5 (3) ensures R m = tH(Q^) < tR$(Q), which is 
contradicted with the last second step. 

2. Forward Part. First recall that p(tR&(Q)) = 1 by (e). Now if tR { ^{Q) > R cr {W), then 
R m cannot be strictly larger than tR$(Q) because in that case p(R m ) > p(tR^(Q)) = 1, 
—si(R m ) < 1 by (c), which violates Lemma 8. It then follows R m < tR^(Q) and hence 
p* < 1 by (e). Conversely, if tR$(Q) < R cr (W), then R m cannot be less than (or equal 
to) tR„-(Q) because in that case p(R m ) < p{tR^(Q)) = 1, —s r (R m ) > 1 by (c), which 
violates Lemma 8. It then follows R m > tR^(Q) and hence p* > 1 by (e). 

Finally we should note that when tR$(Q) < R cr {W), or p* > 1, the lower bound is 
achieved by R m = tR&(Q) < R cr {W) and p* = 1. Thus 

E r (Q,W,t) = te(=^,Q^+E r (R m ,W) 

= [p*R m - tE s (p\ Q)] + [E (l, W) - p*R rn ] 
= E (l,W)-tE s (l,Q). 

Meanwhile, Corollary 1 immediately follows by the above argument. ■ 



B Proof of Theorem 7 

We first recall that if —t\og(\S\Q(s)) < E(t log |<S|, W), then there is no intersection 
between te(R/t,Q) and E(R, W). Clearly, the tandem coding exponent satisfies 

E T (Q,W,t) = E(t\og\S\,W) 
= E r (t\og\S\,W) 

< E r {R m) W) 

< Ej(Q,W,t), 

Here, (93) follows by hypothesis R cr {W) < ilog|<S|. (94) holds since R m must be a 
quantity smaller than t log |«S| by Corollary 1. 

We hence assume that — t \og(\S\Q(s) > E(t\og\S\,W), i.e., we assume that te(R/t : Q) 
and E(R, W) intersect at rate R Q . If R Q > R cr (W), which means that E (1,W) — 
RcriW) > te(R cr (W)/t,Q), then Theorem 6 guarantees that Ej > E T . If R m > R cr (W), 
which implies tR { c)(Q) > R cr (W) by Corollary 2. This ensures Ej > Et by Theorem 6. 

48 




Furthermore, if R cr (W) > R m > R Q , then 



Ej(Q,W,t) > te[^,Q)+E r (R m ,W) 
> te[^-,Q 



t 

= E T (Q,W,t). 

In the remaining, we assume that te(R/t,Q) and E(R,W) intersect at rate R Q and 
that R m < R < R cr . 

For a DMC with E ex (0, W) < oo, we may define the upper bound of the channel error 
exponent by 



E a {R,W)± 



E sl (R,W), 0<R<R s , 
E sp (R,W), R S <R<C, 



where E s i(R, W) is the straight-line upper bound for the channel error exponent, and R s 
is the rate where the straight-line upper bound is tangent to the sphere-packing bound 
and R s < R cr {W) [19], [23]. Clearly, E S (R,W) is also convex in < R < C, and it is 
shown in [19], [23] that 

E s (0,W) = E sl (0,W) = E ex (0,W). 

Now connect (0, E s (0, W)) and (R cr (W), E s (R cr (W), W)) with a straight line, denoted by 
h, where 

E s (R cr (W), W) = E r (R cr (W), W) — E Q {1, W) - R cr (W). 

Again, connect (R rn ,te(R m /t,Q)) and (tlog|<S|, te(log Q)) with a straight line, de- 
noted by l 2 , where 



te[^,Q) =tD(QW || Q), 



and 



te(log|5|,Q) = -*log(|5|Q(s)). 

Suppose that the intersection of E S (R,W) and te(R/t,Q) is (Ri,te(Ri/t,Q)), and that 
the intersection of l\ and h is (Ri, Er { ). By assumption, R Q , the intersection of te(R/t, W) 
and E(R, W), is strictly larger than R m and strictly less than R cr (W); hence by definition, 
R\, the intersection of te(R/t,W) and E S (R,W), must be strictly larger than R m and 



4Q 



strictly less than R cr (W), i.e., R m < Ri < R Q < R cr (W). Likewise, it is easily seen that 
R rn < Ri < R cr (W). Furthermore, because of the convexity of te(R/t,Q) and E S (R,W) 
in the region [R m , R cr (W)], E Rl must be strictly larger than te{R\/t,Q) (as te{R/t,W) 
is strictly convex in this interval). It follows that 

Ej(Q, W, t) > E (l, W) - tE s (l, Q) > E Rl > te (^-, q) > te q] = E T (Q, W, t). 



C Proof of Theorem 8 



As in the previous proof, we only consider the case — t \og 2 (\S\Q(s)) > E{t\og 2 |<S|, W) 
and R m < R Q < R cr (W). Thus, we can upper bound Et by 

E T (Q,W,t) = te(^,0) 

= tD (gw || Q)) 

by the strict monotonicity of the source error exponent. On the other hand, Theorem 2 
gives that 

Ej(Q, W, t) > E (l, W) - tE s (l, Q). 
By assumption, if E {1, W) - tE s (l, Q) > W (QW || Q)), then Ej > E T . ■ 



D Proof of Lemma 7 

Recall that the rate-distortion function R(Q, A) for a binary DMS Q = {q, 1 — q} under 
the Hamming distortion measure is given by (e.g., [16]) 

/ > x f h b(q) - h b (A), < A < q, 
R(Q,A) = I yH! y h ~ (95) 
[0, A > q. 

Clearly, F(R, Q, A) = for R < since the infimum in (73) is attained at P — Q. Simi- 
larly, since R(P, A) < 1 - h b (A) for all P, F(R,Q,A) = oo for R > 1 - h b (A). For the 



remainder of the proof, we assume < R < 1 — h b (A). 



(I) Case ofO < A < q. For R < R(Q,A) = h b (q) - h b (A), we have 



F(R, Q, A) = inf D(P || Q) = D(P \\ Q) 

P:R(PA)>R 



P=Q 



For h b (q) - h b (A) < R < 1 - h b (A), we have 
F(R,Q,A) = pR W >rD (P I, 0) 

min L>(P || Q) (96) 

P±{p,l-p}:R(PA)=R 

min D(P || Q) 

p:h b (p)-h b (A)=R 

= e(R + h b (A),Q), for H(Q) < R + h b (A) <\og\S\ (97) 

= sup[p(R + h b (A))-E s (p)] (98) 

p>0 

= sup[ pj R-^ A (p,g)]. 

p>0 

Here (96) follows from the facts that the continuous function 9(p) = plog | + p) log ^ 
is increasing for p > q and R(P, A) given in (95) is continuous and increasing in p for 
A < p < \. In (97), we note that H(Q) = h b (q) and that log |*S| = 1 as the source 
is binary. (98) follows by the well known parametric form of source exponent function 
introduced by Blahut [13] and noting that R' = R + h b (A) e [H(Q), log \S\]. 

(2) Case of A > q. For < R < 1 - h b (A), similarly as (97), we have 

F(R, Q, A) = e(R', Q) = sup[pi?' - E s (p)}, 

peA 

where R' = R + h b (A) such that H(Q) < h b (A) < R' < 1 = log \S\ and 

d[ P R' - E s (p)] 



P 



= 0, h b {A) < R' < 1 

p=p* 



dp 

= {p* :h b {A)<R' = H{Q^)<1) 

= {p* ■■ Po < P* < oo}, (99) 

where p is the unique root of equation H(Q^) = h b (A) and p > 0. Here (99) follows 
from the monotone property of H(Q^). Therefore, we write 

F(R,Q,A) = su V [pR-E*(p,Q)]. 

P>P0 

In fact, it can be shown that p is the right slope of F(R, Q, A) at R = R(Q, A). ■ 



F)1 
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Figure 1: Example of a 6-ary input, 4-ary output DMC (see [23, Fig. 5.6.5]) for which 
E (p, W) is not concave. 



Csiszar sphere-packing bound, q=0.1 , t=0.75 
Csiszar random-coding bound, q=0.1 , t=0.75 
Csiszar sphere-packing bound, q=0.1 , t=1 
Csiszar random-coding bound, q=0.1, t=1 
Csiszar sphere-packing bound, q=0.1 , t=1 .25 
Csiszar random-coding bound, q=0.1 , t=1 .25 



Csiszar's two bounds are tight fors> 0.001 



Csiszar' s two bounds are tight fors> 0.002 




Csiszar's two bounds are tight fors> 0.0025 



0.009 0.01 



Figure 2: Csiszar's random-coding and sphere-packing bounds for the system of Exam- 
ple I. 
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t=l, q=0.1, when Csiszar's bound > Gallager's bound 




Figure 3: Csiszar's random-coding bound vs Gallager's lower bound for the system of 
Example 1. 
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Figure 4: The regions for the (e, q) pairs in the binary DMS {q, 1 — q} and BSC (e) system 
of Example 2 for different transmission rates t. Note that Ej = on the boundary between 
A and B; Ej is exactly determined on the boundary between B and C. In A, Ej = 0. 
In B, Ej is positive and known exactly. In C, Ej is positive and can be bounded above 
and below. 




Figure 5: The regions for the (a, q) pairs in the binary DMS {q, 1 — q] and BEC (a) 
system of Example 3 with t = 1. Note that Ej = on the boundary between A and B; 
Ej is determined on the boundary between B and Ci; The random-coding bound and 
expurgated bound to Ej are equal on the boundary between Ci and C2. 
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For q=0.1, Exp-LB>RC-LB when a<0.0297 
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Figure 6: Improvement due to the expurgated lower bound for the binary DMS (a, q) 
and BEC (a) system with t — 1. Exp-LB and RC-LB stand for the expurgated and 
random-coding lower bounds, respectively. 
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Table 1: Ej/E T for the binary DMS and BSC pairs of Example 5. "N/A" means that 
tH(Q) > C such that Ej — E T — 0. "t" means that this quantity is only a lower bound 
for Ej/Et- 
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Figure 7: The regions for binary DMS-BSC (q, e) pairs and binary DMS-BEC (q, a) pairs 
under different transmission rates t. In region F (including the boundary between F and 
H), Ej > Et > 0; in region G (including the boundary between G and F), Ej — Et = 0; 
and in region H, Ej > E T > 0. 
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Figure 8: Binary-input AWGN or Rayleigh-fading channel with finite output quantization. 



63 



Binary DMS Q={0.1,0.9}, t=0.75 
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Figure 9: The power gain due to JSCC for binary DMS and binary-input 2 m -output DMC 
(AWGN channel) with t = 0.75. 
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Figure 10: The power gain due to JSCC for binary DMS and binary-input 2 m -output 
DMC (Rayleigh-fading channel) with t — 1. 



F>F> 



A=0.1 



A=0.2 




A=0.3 



0.5 
0.4 
0.3 
0.2 
0.1 




c l 


V B 


V A ■ 


C 2 




A=0.4 



0.1 0.2 0.3 0.4 0.5 




Figure 11: The regions for the (e, q) pairs in the binary DMS {q, l — q} and BSC (e) system 
of Example 6 with Hamming distortion for different values of the distortion threshold A 
with t — 1. Note that Ef = on the boundary between A and B, and Ef > is 
determined on the boundary between B and Ci. 



A=0, Ej =E } is determined 
if qe [0.0001,0.0481] 




A=0.1, Ej is determined 
if qe [0.0209,0.2129] 



A=0.2, Ej is determined 



A=0.3, Ej is determined 



Figure 12: Fix e = 0.2. The JSCC exponent lower bound of the binary DMS {q, 1 - q} 
(q < 0.5) and BSC (e) pairs under Hamming distortion with t — 1. For A = 0, Ef is 
determined if q £ [0.0001, 0.0481], which is the same as the random-coding lower bound for 
the lossless JSCC error exponent. For A = 0.1, Ef is determined if q £ [0.0209,0.2129]. 
For A = 0.2, Ef is determined if q £ [0.0955,0.5]. For A = 0.3, Ef is determined if 
q £ [0.2854,0.5]. 
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